Data Stack as a Service (2/3)
This series of articles is a pedagogical experiment to help apprehend the data world as it is and its evolution. It’s dedicated to non-data experts.
In the first article, we’ve introduced the concept of a Data Stack and its evolution from a Traditional Data Stack (TDS) to a Modern Data Stack (MDS, a controversial term that means “the data stack in 2022”).
Introduction
In this second article, we’ll enter more profound into a data platform's Data Management and Delivery pillars in 2022.
As usual, I will take a short step back for?perspective?purposes.
We’ll focus on:
These perspectives will help us understand structural challenges, our company's position, and how other comparable companies face them.
Then we’ll explain how to?upgrade our data-driven company to the next level?by implementing Data Management and Data Delivery plans.
Perspectives
Data evolution From 2000 to 2020
Let’s make a quick qualitative comparison of data evolution:
(don’t hesitate to suggest other attributes for this comparison in the comments below).
Expertise-Maturity classification
When discussing data-driven(-ish) organizations, they don’t all start from the same context. We can classify them with data expertise (people) and maturity dimensions (data culture, organization’s age, achievements).
LL case: In the Low data expertise and Low data maturity case, there is no data stack nor a data team. We have individuals using excel/google sheets without valuable collaboration nor actual governance policy regarding how to create, update, share and delete data in those organizations. Tactical and strategic alignments on clear metrics definitions and how to make them actionable are complicated. The lack of automation and observability makes processes error-prone and extremely long.
Typical stack:
LH case: In the Low data expertise and High data maturity case, there is a Traditional Data Stack with a data analyst(s) and data engineer(s). Some processes and tools quickly become obsolete, and it’s a real pain when organization leaders require new metrics. Or when they change the way to calculate existing ones. That either will take a lot of time to produce or/and be unreliable because of the lack of centralization of data preparation (depups, clean, curate, aggregate) and metrics computation. For example, we usually see two to five ways to calculate the organization’s profit in this kind of environment, none of them use the exact source of truth, and in the end, nobody knows which is the correct one.
Typical stack:
"Imagine a TDS like a Christmas light decoration in a box. The lights are needed at the right time for the Christmas party, but you discover that some of the bulbs need to be changed. You have to untangle the whole thing to find the broken bulb. After a while, you finally replace the bulb, but by the time you are done, the party is over."
HL case: In the High data expertise and Low data maturity case, they mostly use new generation BI tools without a complete data platform. They have business & data analysts but no data engineers. They master the analytics part (Metrics / KPIs, some processes). Still, most of the time, we are missing streams automation/orchestration and data governance as a whole, even if (unfortunately) BI tools try more and more to provide their governance features. We say “unfortunately” because we’ll see later how many Saas providers are creating their dashboards or governance features which makes it too redundant to integrate. Using Reverse ETL tools can help serve SaaS Apps without integrating technically directly.
Typical stack:
HH case: In the High data expertise and High data maturity case, they already have it all, the modern data platform and experienced data teams. They “only” have to manage the beast, like any other organization's product.
Typical stack:
Upgrade to the next level
What are the good questions?
First things first, what are the questions before creating/upgrading anything in a company?
Management
Data Analytics
Data Engineering
Data Management Pillar
Now that we asked the right questions and got some answers, successfully adopting a data-driven approach requires efficient data management, whatever the organization and its expertise-maturity situation. Which should be based on four major parts:
Let’s take an example to clarify all this. An organization XYZ in the humanitarian domain must comply with RGPD in the EU. Concretely, when they ask for donations or to fill out any form, they get PII (Personal Identifiable Information). Not everybody in the organization should have access to them, or at least they should be anonymized.
领英推荐
To comply with these regulations, XYZ must define data governance:
Knowing that they have the following process:
How can we comply with RGPD from begin to end of these pipelines?
In addition, different tools use different namings or data structures for the same concept. And people in your company too (credits to Castor’s?blog):
Whatever the data project, the first step is reducing noise and emphasizing the information. It requires?Standards, Integration, and Quality?to succeed (or avoid failure).
We have illustrated so far that we can’t build a data platform by just stitching together different apps and tools. There are numerous traps we could fail into without thinking about a clear data management strategy first.
Now let’s assume we have a nice and clean?Data Management Pillar.?Data engineers use it to?prepare/transform?data. They create sophisticated SQL requests that can take hours/days to be processed. Then, business users/data analysts can access this newly computed data via the?Delivery Pillar.
Delivery Pillar
“where” and “how” the data platform distributes its outputs?
It can be :
The way the Delivery distributes outputs can be in dashboards, stories (group of dashboards), embedded UI, APIs endpoints for better interoperability. Or simply by plugging your BI tool and making SQL requests for last aggregation before visualization, and so on.
What kind of data are we processing at the Delivery level?
Remember the “What are the good questions above?” when defining metrics and KPIs? Now it’s time to take them from the Data Management storage (usually a Data Warehouse) and use them for further calculations or visualize them.
Metrics can be very different from one company to another, in quality and quantity. Here are some examples:
Always start with a minimum of them. A way to filter them is by removing the “vanity metrics” and keeping the “actionable metrics.”
"Think about a marketing landing page for an ebook download. Measuring pageviews doesn’t allow you to make a business decision, but measuring download rate might inspire you to test out different on-page wording, call to action buttons, or styles of form submission."
What’s next?
At this stage, our managers and tech teams (BI, AI, OPs) got their data, but what about all others? Employees are not all data analysts or data scientists. Business employees, from marketing, finance, or sales, need the resulting insights from BI and AI experts. Which are much more powerful if they drive the everyday operations of your teams across sales, marketing, finance, etc. in tools like Hubspot, Salesforce, Netsuite, SAP, Workday, Gainsight, Zendesk, etc.
These operational teams require very pragmatic insights in order to their daily duties. Some examples:
Reverse ETL, therefore, has emerged as a key part of the modern data stack to close the loop between analysis and action or activation. These tools will redistribute “insight as data” into SaaS Apps or internal applications automatically.
Here is a list of providers in this field: Hevo Activate, Hightouch, Census, Polytomic, Omnata.
Conclusion
We closed the loop, and it’s time to conclude.
In this second article, we’ve seen how the data world has changed structurally within the last 20 years in all dimensions and how challengee is to design a data platform today.These perspectives helped us understand structural challenges, how other comparable companies face them, and where our company positions relative to them.
Whatever the data strategy, the first step is reducing noise and emphasizing the information, not considering which applications and tools could help nor how to integrate them.
We also learned how to?upgrade our data-driven company to the next level?by preparing Data Management and Data Delivery plans.
Asking the good questions first is key, force yourself to do this before any complex task, ask to mentors for help if needed, everything is easier then.
We hope this can be activable in your learning process. Don’t hesitate to ask your suggestions/questions in the comments below so that we can make this more accessible to future readers.
Links to other articles of this series: