Data Strategy - A Practical Framework
Anurag Singh
Building.. AI Agents for.. (wait for it ??); Applied AI ?? | Data, Digital & AI Transformation | Previously Co-Founder, Chief Product & Enterprise Officer at Affle ?? [NSE India: AFFLE].
Strategy is often undervalued. Anyone operating in Start Ups will have ingrained the importance of Execution over Strategy, Peter Drucker couldn't have said it better with his iconic statement "Culture eats strategy for breakfast" - culture that translates first and foremost to execution. Many therefore might find it difficult to think of scenarios where one needs to pause, plan, and then execute.
The paradigm for strategy changes drastically though, if you are not building a Start Up, but say are in the early stages of digitally transforming a large enterprise or perhaps, building an entire new city. Being nimble would surely help, but not having a strategy could mean a project that is exponentially more costly than planned or one that is entirely misaligned to your core purpose and objectives.
Strategy for strategy's sake can seem vague, but when implemented with a practical framework, it makes for clear tangible benefits.
What is strategy then?
A good strategy provides a clear roadmap, consisting of a set of guiding principles or rules, that defines the actions people in the business should take (and not take) and the things they should prioritize (and not prioritize) to achieve desired goals.
Amongst the many several definitions of Strategy, above gives a precise and practical version that immediately resonates and reflects the benefits of having a clear approach in place.
This broad definition could apply whether you are planning a complex Corporate strategy or one for your Data. Of course given data's inherent living nature and 'digital' construct - there need to be factored various other elements when defining what Data Strategy is, and entails.
Defining Data Strategy
Taking inspiration from the above definition, lets frame one for data strategy:
A good data strategy provides a clear data map of producers and consumers, defines all technologies and systems involved in processing this data in the middle, and puts in place frameworks and rules for controlling, managing, publishing, and consuming this data; to achieve desired goals.
The Practical Framework
While the definition may seem complex, we'll break it down and build upon each element to give you a definitive practical guide to data strategy.
In order to ensure our 'practicality' promise and based on the above definition, we will present a simplistic framework for putting in place a data strategy, consisting of just three parts:
While it may seem minimalist, I hope this framework covers all critical aspects of an astute and effective data strategy.
1. Data Map
First critical step of a data strategy is to know what data you are dealing with, or will likely deal with. A data map, is the surest way to create a clear and concise representation of your data and related "hubs" and "nodes". It should include the flows and connections indicated at a high level, showing how various systems are connected, share or pass data between each other. A Data Map is different to a Lineage map. Lineages are for particular datasets or data tables; the data map instead is a macro view of ALL your data generators, consumers, and systems.
5 Ds of Data
In planning this map, and through the process of building your data strategy, keep these essential D's in mind.
Also consider this important D, as Topé Osu states:
"Organisations need to think beyond delivering data, they must think how they can democratise it instead. "
More in-depth elements are covered all through this note further on.
Data Application: OLTP vs OLAP vs XYZ
An exercise in parallel with the Data Map is defining applications for the data you will be dealing with. Is your data transactional in nature? Or is the core purpose of your data analytics? Will you have product catalogues or organisation wide consumable datasets? What are the use cases?
For each Data Producer, map the application and categorise it basis OLTP (Online Transaction Processing) Data and OLAP (Online Analytical Processing) Data. Most should fit into these two categories, but feel free to sub-categorise them e.g. Real-Time Data, Streaming Data, Data Feed or create your own e.g. PRD (Processed Ready Data).
Data Host: Cloud, Multi-Cloud, On-Prem, On-Edge?
The next element you will need to factor as part of your Data Map is to define where each of the data generators, consumers and systems are hosted. And evaluate whether you need to have a single cloud provider, multi-cloud or an on-prem solution. You will also need to assess the criticality of "nearness to data" and actionability needs for such data; for e.g. does your system have IoT sensors that need to transmit and act on data intelligence "near by"? If yes, you should assess Data on Edge solutions, ensuring to factor devices and system have enough processing and storage power for the data application / use case required to be executed.
Provider strategy: Along with these choices you will ofcourse have to decide which provider or company you will use. That will depend on features and pricing, details for this are out of scope.
2. Data Systems & Data in the Middle
Based on the Data Map and Data Applications, the next part of your data strategy should focus on the systems and data in the middle. Based on several factors, that may vastly differ, you will need to take decisions about various elements.
Data Spread: Federated vs Centralised
Does your Data Map and Data Application / Use Case require a federated or a centralised data system? More often than not, no matter how much you want to bring control and centralisation, it is advisable to prepare for a federated, decentralised data future. As your organisation grows or a city is built, it will demand differing data systems, based on comfort of teams operating these systems or more importantly based on particular use cases suited for particular data systems. At the end, disparate and even disconnected data systems are fine to have, so long as they have inter-operability (something we will discuss specifically in next sections).
Data Storage: File Storage, Data Lake, Data Warehouse, Lakehouse, SQL/NoSQL Databases ++
Basis your data leaning towards OLTP (Transactional) or OLAP (Analytical) you will need to define the data storage strategy. While transactional data will require enabling an ideal combination of databases and file storage (like S3), analytical data will require assessing data lakes and warehouses. You may also have come across the term "Lakehouse ", an amalgamation of a Data Lake and a Data Warehouse. To know the difference, of which to use when, read this insightful article by the team at MongoDB. Also remember that a plain file storage system like Amazon's S3 can many times act as your Data Lake; what this implies is that one must be very critical in the evaluation of systems.. many times the simplest solutions can undertake some of the most complex tasks. And remember that every system you add will bring multifold Cloud, human operations and system costs.
Another manner in which you should factor viewing your data beyond the OLTP and OLAP distinction is to classify the data, and its use cases for Hot, Warm and Cold data. Based on the needs, Hot storage may be serviced via an in-memory Database or Queue solution, Warm via a Database or Warehouse, and Cold via a Lake or File Storage system. Read more here about why one should think about data on the basis of these varying use cases / access needs.
Aside from above storage solutions, depending on your use case you will also need to evaluate if you require a Master Data Management (MDM) storage system or CMS (Content Management System). Typically a MDM solution is popular in commerce use cases, but can also be applicable where any "Source of Truth" file / dataset needs to be maintained.
Data Operations: ETL/ELT, Datasets, Data Feeds
A critical part of data strategy is to ensure an approach for Data Operations. This DataOps engine is the central nervous system that makes everything tick. Here is where you will need to Extract, Load, Transform, Normalise data from Producers and pass on to Consumers.
Depending on whether you are dealing with streaming data or static data you will need to factor the data systems such as using Kafka or big data analytics organising tools like Spark.
领英推荐
Here is also where you need to ensure, despite whatever systems are used in the company, there is interoperability of the data produced by maintaining and sharing any and all datasets and datafeeds like "products", ofcourse with visibility and governance controls. More on this in the Frameworks & Governance section.
Insights Readiness: Data Analytics ++
Data insights are a non-obvious must have for data strategy, but evaluating and planning the end goals / target insights in advance can help better plan for the data operations stack you need. You will need to evaluate insight tools varying from Power BI, Tableau and the likes; or decide to rely on powerful open source visual/ graphical analytics libraries like ECharts .
As an extension, when planning this analytics layer, think about insights on the following lines: Descriptive Analytics (what has happened), Predictive Analytics (what could happen), and Prescriptive Analytics (what should be done).
ML & AI Readiness: Vector DBs, Local LLMs, Hardware
A critical element for your data strategy, ofcourse, is ensuring readiness for Machine Learning and Artificial Intelligence systems. These must stem from your applications of your data and use cases.
Do you have a need for end user applications that require you to learn from past data? Make predictions? Automate basis real-time insights? In most cases the answer will likely be yes. And you will have to factor solutions that will allow for Machine Learning models or Deep Learning Networks and related workflow training.
Furthermore, you will need to factor for other Artificial Intelligence solutions ranging from NLP (Natural Language Processing), to OCR (Optical Content Recognition), to Vision Models to Generative AI. Here too you will have to review your use cases. Say you want to enable a "Talk to your Data" use case, you will need to ensure availability of systems that can vectorise and store enterprise data in order to be able to add context of this data for LLMs to base their replies on. Similarly, you will have to decide if you will rely on external LLMs or local on-prem LLMs. Your choice here will further define the type of systems, hardware and procurement you will need to undertake. If you want to improve your AI basics & readiness, give this must-read book a read .
Open Source vs Proprietary
In each of the above decisions, you will be faced with the question whether to spend multi-millions for a proprietary system or build your own solution ground up by leveraging open source technologies.
This choice needs to factor two primary criteria: Scale and Cost.
While there are many open source systems that can be designed for scale, several of these may offer limitations to humongous scale. That said, the Apache Software Foundation product suite include some of the most mature projects and allow for extremely effective data management be it for storage, operations or analytics; that too for reasonably large scale applications.
If you have the expertise, building a ground up solution is ideal. As your needs scale you can always upgrade specific components with paid proprietary systems. Even if your budgets allow, you must evaluate your organisation's skill capability for building open source; results might surprise you, both in massive cost savings and speed of execution.
3. Frameworks & Governance
Lastly, once you have completely mapped your data producers, consumers and systems you must put in place effective frameworks and governance (access) rules in oder to ensure your organisation or city is designed to maximise data exchange in a safe and secure manner.
Data as Product
The most critical framework you can implement today, right away, is to ensure that your organisation is thinking about Data as a Product. Take Amazon for example, and now imagine a Marketplace or Exchange for your data. The only difference being that the visibility and usage rights of the data are specific to each individual in your organisation or even outside of it (Open Data Portals).
If you've been in the data world even for the shortest while, you ought to have heard of Zhamak Dehghani 's Data Mesh Principles - key amongst them is the critical need to treat data as a readily consumable product. What that effectively means is ensuring that any dataset or data feed produced in your organisation must be "curated" with all the right information and context such as a "product description", a "data dictionary" and more elements that allow any Consumer to view, understand, and plug into that dataset or datafeed for their applications; again so far in that they are authorised to do so. Ofcourse the Data Mesh Principles go furthermore, you can read the original paper / article here . Another interesting read is to study Data Fabric vs Data Mesh, incase you are wondering how they differ.
A critical tool to include therefore, as part of your data strategy, is to decide on the implementation of a data marketplace platform / enterprise data exchange / data catalog system that enables the discovery and consumption of these datasets and data feeds.
Another critical element when viewing your data as a product is to map how your Data Strategy feeds into your Product Strategy and wider Business Strategy. Infact as Mohanaselvan Jeyapalan puts it:
"I always believe that every company should see it's Data as an invisible product that powers other products."
Data Control, Governance & Security
While enabling Data as a product and "publishing" it for easy access sounds appealing, there is ofcourse the risk of data breaches and misuse. You therefore need to put in place strict measures for Data Governance to both Visibility (who can see what data) & Usability (who can use what data).
Linked to this is who should own the data? Here again the Data Mesh Principles give some sound guidance and you can review it to decide Data Owners and Stewards. Read the section "Domain oriented data decomposition and ownership".
Data Security is a vast topic in itself but should form the last stronghold in completing your data strategy. You will need to go back to your Data Map and think of interventions required at each “Node”, to ensure Data Security. From access protocols for your network, cloud systems, on-prem systems to encryption of data at rest to compliance with privacy laws to following basic security practices around usernames and password sharing, a thorough planning and mapping must be completed and documented.
Key here is how not to make them overtly complex but build systems such that security becomes as much inherently possible than one that is mandated via manual rules. Easier said than done, but effective if done right.
Conclusion
Capturing the essence of Data Strategy, let alone a practical guide to implement one, as a single article may seem a gross oversimplification. That would be entirely true. However, an attempt must be made to distill and simplify the complex topic of Data Strategy. More importantly, in the least, we must strive to make it tangible. So that it does not come across as overwhelming as it does. I hope this article can be a small part of your study into the world of Data and gives you a much needed practical breakdown on the must-think elements of a Data Strategy.
If you have deep expertise and would like to point out some corrections, or add an important missed component, please share below in comments and I will make necessary updates with credit to you. Happy learning, strategising and implementing.
See below, especially the last link. ??
--
Helpful Resources & Further Reading:
Company to Help 'Implement' All of Above - personal plug ??
Chat with our Custom GPT about this article & data strategy - requires GPT+
Thank you for reading. Discuss, add comments, ask questions below.
Principal @ Resonant Agency | Marketing Transformation Leader. Ex. IPG Mediabrands, McCann WW, IRIS, Publicis.
7 个月Very Insightful. I feel this needs to be transformed into an infographic that is readily available to any C-Suite that is contemplating a contemporary data strategy.
Founder ApeShop, ApeSAAS, Ape Agency, JPMorgan alum, Endorsed by UK Gov- Tech Nation Excep/Global Talent Visa, Fmr NSW Young Entrepreneur of the Year, Keynote Speaker
7 个月Great stuff Anurag and well done with the custom GPT ????
Marketing Automation Implementation Consultant | Driving Efficiency and Success through Seamless Marketing Automation Integration
7 个月Sounds interesting
Envisioning Data and AI Ecosystems in City of Future | Speaker | Award Winner| Strategic Advisor | Mega Projects Leader | Incubating Sustainable Businesses
7 个月Thankyou Anurag Singh for the opportunity to contribute. Complexity is best understood when simplified...
Exceptional AI Talent - UK Govt. | Director Data and AI | Keynote Speaker | Mentor | GenAI / AI / LLM / RAG / Prompt Engineering / GNN / ML / Deep Learning / NLP / Computer Vision | Cloud #OpenToWork
7 个月Very well though article on data strategy framework, providing insights on data maps, data systems and governance. Really liked Peter Drucker's words - Culture eats strategy for breakfast :)