Blueprint to Retail Data & Analytics - The Enigmatic World of Data

When it comes to data, retail is no different from any other business industry; yes, data can be very positively disruptive for the retail industry. “Data is the new currency, data is the new oil” so they say. Data or lack of it is also a key differentiator between success and failure. Recall the battles between “David and Goliath” of movies distribution businesses – Blockbuster vs Netflix. As evidenced in the book “Netflixed – The Epic Battle for America’s Eyeballs”1, Netflix won solely on the strength of its data. Similar story comes to mind when you go not too far back in the past and recall how Amazon wiped out books super stores like Border, Barnes and Nobles and hundreds of other book stores in early to mid 2000s - the common denominator that winner in both battles had was superior use of data.

Having data is not enough. Well, everyone has data. What matters is if and how you can use it to your advantage.

So how and where do you start? It is not my intent here to educate and coach Chief Data Officers – they are experienced industry data captains and I am sure it will be audacious and arrogant of me to even attempt that. This post is for those, who, at some point, aspire to take on data leadership role in a retail company. So how do they go from the “first” step of joining a retail company to the next ones - in an industry that is net new for them?

While we are on the subject of Chief Data Officer, truth be told, not many companies have seen great benefits from just hiring a “Chief Data Officer” (CDO). Unless Chief Data Officer has a revenue target, as we had at Staples, and unless he or she delivers to it, in my mind, CDO is just a hyped-up rank. Unfortunately, most companies take CDO as someone who will fix data governance and security for them. To me, hiring a C-level executive and then entrusting that person with just governance, is almost a criminal waste of a C-level position. But more on the subject of CDO later posts, I don’t want to digress too far off here. 

The early days of your tenure as a data leader should be spent in understanding and meeting different members of leadership teams in almost all the realms of the company. The Bigger and older the company, the more tenured would be employees and leaders. Be cognizant of the fact that people are connected and deeply known to each other so be careful in airing your opinions about people until you fully understand the “underground” networks. In these initial meetings, it is better to just introduce yourself, provide a summary of your resume and the previous work as it relates in helping the new job, and your understanding of your challenges in the new company. Do NOT talk about people during your initial 1:1s. It creates an impression that you have an appetite for politics. That is never a good place to start. You will also start impacting your brand negatively from the get go.

Other conversation that can be had during the first 1:1s with leaders is their understanding of what is broken from data perspective. Building on that, feel free to provide how some similar problems were solved in your previous assignments. Also, in these first 1:1s, stay away from making any massive promises. Under commit and over deliver - always been my mantra!

Understanding the landscape

When you get into the new shop, take time to understand the landscape. Ask lots of questions since “innocuous” questions have never hurt anyone. Be polite, be hungry and be inquisitive. Ask lots of “Whys”. Question everything though you may know the rationale in your mind for some or many of them – don’t assume anything. Take furious notes. Move your notes from rough to fair notebook in the evenings – that would help you internalize and firm up architectural diagrams and systems in your brain. During this period, stay away from suggestions and recommendations as much as possible. The reason is simple – you haven’t learnt anything yet, so any suggestion is likely to attract some smirks, if not outright confrontation. If after all, you do succumb to suggestions every now and then please make sure that they are colored appropriately. For example, instead of stating that “something should be done in a given way”, you may want to say “I have seen organizations doing it in a given way” etc. LOL!!

Understanding the Gaps

After having taken time to understand the landscape, you need to go back to drawing board and understand the company business strategy. Basing your data strategy off the business strategy, figure out the gaps that are likely to adversely impact business strategy from data and technologies perspectives. For example, if one of the strategy for retail business is focused on doubling sales of medicinal supplies to resellers and your resellers’ data has huge chasms, there is no way you would be able to accurately project baseline. Lack of baseline prevents you from understanding your starting point which then prevents you from estimating level of effort (LoE) needed to get to the end state.

Once you have internalized the strategy, start identifying the gaps. Gaps can be on all fronts - data state, teams’ readiness and technologies. In other words, understand the gaps in terms of people, process, data and technologies.

What does data state mean? In simple terms it means the current state of data. Is your reference data in good shape? Are you getting data in timely and real time manner? Do you have it stored in correct data stores? Is it accessible? Is it unified or is it silo’d and fragmented? More on this topic in subsequent posts.

Teams’ readiness points to competence of people in understanding data and technologies supporting the data. Will people have quizzical look on their faces when you refer to MDM (as in Master Data Management and Reference Data) – in other words, is the notion of MDM alien to them? Do people understand the need for strong and resilient reference data? Do they have training to imbibe technologies needed to cover the gaps? Do they have mindset and appetite for that training? Even more rudimentary is the question - are they the right folks to get trained?

Finally, technologies part should be obvious – if there are gaps that exist due to lack of technologies then those technologies need to be brought in. For example, if a gap points to lack of real time data, then it is mandatory that there has to be a solution – in-house or hosted that provides real time data. Technologies like Kafka, Sqoop etc may be needed for this type of solution if hosted in-house. I suspect I would discuss this in detail in later posts.

What is needed to solve Data Problem for a Retailer with Sales > 1 Billion per year?

There are few things that are table stakes in a retail shop. They are almost non-negotiable. The first one refers to the need for very strong and non-fragmented existence of master and reference data (aka MDM, as mentioned above) related to the organization and industry. This is one of the most important need for any organization. Building massive data solutions on very fragmented reference data is akin to building a multistoried concrete building with sketchy and rickety foundation. What are these different reference data models? For example, if you are in retail industry, the first three sets of master data models are - Customer data, Product data and Vendor data models. Then, there are location (or geography) data and Sales people (or employee) data models that are important. In many parts of the world, this master data set is referred to as Belly Buttons since all of organization is built around this data set.

Make sure you fix them as your first priority. Do not try and spin-off a "TWO type project" (TWO type projects are - 2 million dollar, 200 people and 2-years long projects, you get the idea!). For one, no one in the company will trust a stranger like you with two million dollars - for anything. And more importantly, and especially for a thing like data, people and organizations do not have appetite to spend huge money. This is another story for another day - but you may quickly discover that data typically gets window dressing in your organization. Don’t get disheartened. People want best data without spending anything. Alas!

The next thing you need is a system that allows you to recognize sales and orders. All data from the sales and orders flow through this system. What is the difference between an order and a sale? An order has an order header and one or more order lines. It also has shipment and shipment lines. Further, it may have invoices and invoices lines – depending on multiple invoices being generated for a given order due to different ship-to/bill-to addresses. When all these are collapsed into one single item and revenue is actually accrued to given book of business, it may be referred to as a “sale”. Mind you, different organizations have different definitions for this so be aware that these are two different terms. Simply stated, the final dollars that company gets from an order is sales. Hence there should be an existing system to capture and store the order and sales data.

There should be another set of system(s) that should consolidate orders and sales from myriad upstream transactional systems. That system is typically called the Operational Data Store or ODS. This system is also needed because this is the first place where all data from different channels come together into a single data store. In a single channel business, ODS is almost the system of your orders management. It is possible that ODS can also lend itself to some level of reporting. I would call this level of reporting as lightweight reports.

Further downstream, there is a need for an aggregating system – typically referred to as the warehouse or mart system. This is the system that largely helps in doing heavy duty reporting, analytics and perhaps a bit of predictive analytics.

Finally, you may need a system to do some amount of predictive analytics, statistical “what-if” scenarios, data sciences and research. Typically, this type of work can be best done through the power of distributed computing. Big Data technologies like Hadoop, Storm, Spark etc do this fairly well. Teradata, DB2 and Oracle also do it as well, however, the price-point changes dramatically. Oracle showcases Exadata which still suffers from “share everything” architecture, while Teradata can demonstrate the firepower of distributed, “share nothing” architecture but comes at a high price-point. Oracle’s real application cluster (RAC) fails beyond a certain number of servers. The costs of remastering a crashed server’s resources increasingly becomes limiting factor and impacts horizontal scalability of a cluster. This is the limitation of “share everything” technology. But this may not be a limitation if you are playing with sub-hundred terabytes of data. More on this in later posts.

As the scale becomes bigger –that is to say that as the number of orders per day, number of visits on the ecommerce site increase dramatically, the solutions may change somewhat but if you have architected the solutions right and build them on right technologies, it may not change dramatically. Building on systems that are scalable and extensible allows you to do that. Think and architect for capacities and scale orders of magnitude higher when designing, build the smallest needed for near term – next one year or so. Why? Because technologies evolve, hardware suffers Moore's law and businesses change. I am sure to double click on this in future posts. The important thing about scalability and extensibility is the hidden notion of cost – at some point, technology wise, your solution can scale without any issues. However, if the all-inclusive cost (Total Cost of Ownership or TCO) that includes - cost of expert employees needed to run hardware, hardware, software licenses and annual maintenance – is so high that scaling can break the back of your budget then it is not “scalable” for your shop. Retail shops are typically short on budget – keep in mind that your organization will typically not have deep pockets – very unlike healthcare or finance industry.

Lastly, have technologies like R, Tableau and other insight tools that allow you to slice and dice the cubes of data. These technologies should be able to connect to both traditional RDBMS technologies based data warehouses and marts as well as distributed data warehouses on Hive etc. Remember, visualization is equally, if not more, important as is correctness of data.

If you have these few systems/tools in place, your work becomes easier. It doesn’t become slam dunk yet though.



Angie AGARWAL

SOPHiA GENETICS Inc.

5 年

Indepth. Some great takeaways.

回复
Prakash Seernani

Helping (pro Bono) Tech companies align operations to achieve strategy objectives through Innovation

6 年

Very well written - crisply explained and quite exhaustive. Wonderful reference material for the new generation.

要查看或添加评论,请登录

Satish Mehta的更多文章

  • Customer Journey Map

    Customer Journey Map

    Customer journey map (CJM) is a representation of how your customer starts from a starting point and arrives at a…

    3 条评论
  • Personalization and Recommendation

    Personalization and Recommendation

    What is difference between personalization and recommendations? First off, if you are doing it extremely well…

    5 条评论
  • Cultural Changes - The Innovation Culture

    Cultural Changes - The Innovation Culture

    For those who work in a traditional technology company, innovation is typically part of the organizational DNA…

  • Cultural Changes - "Scare, Stop, Sabotage" Culture

    Cultural Changes - "Scare, Stop, Sabotage" Culture

    When you are trying to bring about a change, you will meet different folks and teams. Some of them will love the change…

    8 条评论
  • Cultural Changes...Celebrate Small Wins.

    Cultural Changes...Celebrate Small Wins.

    Culture is very big part of change that you as a leader are trying to bring in. Cultural change is also the most…

    4 条评论
  • Power of Conversations

    Power of Conversations

    I strongly believe that beyond a point, technical skills do not matter as much as the conversations that you could and…

    17 条评论

社区洞察

其他会员也浏览了