A serious word about Data Democratization
Gregor Zeiler
CEO daitastack | B2B Startup Advisor for Product Market Fit-GTM-Growth | Fractional, Interim, On-Demand Exec. | 30 years Data & AI | Advice on data strategies, architectures & tools | 3x D&A Entrepreneur
As a child of the super centralized Enterprise Data Warehouse (E-DWH) era, I know the pain points of decentralized data silos very well. The reflexes to centralize data are literally burned into the brain stem, like the flight reflex away from the saber-toothed tiger.
Currently, posts are full of Data Products merged into a wonderful Data Mesh. The entire data world seems to be literally absorbed by Data Democratization. On closer look Data Democratization did not just begin with Zhamak Dehghani 's ingenious DataMesh approach. It all started much earlier.
If you want to benefit from Data Democratization, you need to understand the whole democratization journey. Then you will be able to make the appropriate plans for a successful implementation.
It all started with data centralization
Strange, but that's how it is. The data democratization journey begins with the centralization of data silos into a central Enterprise Data Warehouse (E-DWH). Let's start with the drivers of data centralization in a Data Warehouse:
The Data Warehouse (DWH) solves the problem. Data is neatly integrated, provided with standardized reporting structures, on a very high-performance data platform.
At that time, a centrally organized team of highly qualified and experienced data engineers was responsible for setting up the DWH and the daily operation of the loads. In addition, this team was usually also responsible for setting up the reports.
The catch is that integrating data from many data sources takes a lot of time. The even bigger catch is that all the business areas involved must agree on standardized reporting structures. A never-ending story due to different requirements. And business requirements change every second anyway.
A huge workload that caused the backlogs to grow rapidly and fueled the frustration about the poor delivery service from IT.
The rise of Self-Service BI and the Big Data hype
Somewhere around here, the story of Data Democratization began when software vendors realized that Self-Service BI (in other words, low-code reporting) could help business users get the results they wanted faster in a do-it-yourself mode.
This was true until the threshold was crossed where business users needed more, different, and additional data for their analysis needs. Pressure mounted on data engineers to implement enhancements and changes to the DWH in the shortest possible time. And they failed. Frustration was still high on both sides.
The software industry jumped into the breach again and started the Big Data - Data Lake hype. The motto: Don't waste time curating data, throw everything into one pot and give all business users (often called data scientists back then) access to the big sea of data. We no longer have to worry about what data we really need, because storage is becoming so cheap.
Data Lakehouse makes it smart again
The advantages of Big Data technologies to store data of different types efficiently were already great. But without a curated DWH, there was still no way around it. The result was a very high level of data redundancy.
领英推荐
Why not combine both concepts in a Data Lakehouse? However, this solution was not primarily a data democratization step, but only an optimization step of centralized data storage.
The final data democratization of this central data storage began afterwards with the Data Mesh approach and Data Product thinking.
Dissolving the last bastion – Data Engineering gets decentralized
Handling data as a product and assuming data ownership. In other words, the assumption of responsibility for the preparation, quality, availability and provision of data by decentralized data product teams is a major paradigm shift.
The last bastion of centralized data engineering has thus fallen. And now, Data Mesh brings us back to the Data Silos in the form of Data Products. All together then available in a Data Marketplace.
End of the pig cycle loop! Back to the roots of siloed data pots!
Well, not quite. The silos are now called data domains and produce data products that at least consist of data sets or even have the data prepared in reports. The result of such data products is offered for consumption in Data Marketplaces. Users or buyers conclude contracts with the data product producers/owners in which not only the content but also the quality and terms of use are regulated.
So things are a little different than they were in the era of data silos many years ago.
What are the main implications of this massive step in Data Democratization?
In summary, this massive step toward data democratization is certainly not an easy one.
Are there alternatives?
No! We need the speed and flexibility to produce data solutions and buy them with the caution we need in a democratized data world to be economically successful.
Given the implications described above, a plan must be developed for successful implementation. Stay tuned as we go through each point in detail!
Subscribe to data-ai.zone