Data Mesh: The Dark Side Of The New Data Hype
Zoltan Horkay
Cloud Migration Enabler ?? App & Data Modernization Expert ?? Oracle Elimination Wizard
Data mesh and microservices architecture are two concepts that share many similarities. One could argue that data mesh is a replicated concept of microservices architecture in the data domain. In this article, we'll explore the similarities between these concepts, examine the answers of Data Mesh to the old problems of the data domain, and compare those with the traditional approach of a Data Warehouse.
What is Microservices architecture?
Microservices architecture is a software development approach that emphasizes breaking down complex applications into smaller, independent services that can be developed, deployed and scaled independently. Each microservice is responsible for a specific set of functions and communicates with other microservices through APIs (Application Programming Interfaces).
It allows for more flexibility, scalability, and faster development cycles because smaller teams can work on individual microservices without impacting other parts of the system.
What is Data Mesh?
The tremendous success of microservice-based architecture revolutionized traditional application development and generated many debates in the data domain if this concept could be applied and replicated to data products like data warehouses. And Data Mesh was born. Data mesh is an approach to data management that emphasizes breaking down data into smaller, domain-specific data sets that can be managed and governed by individual domain teams. Each data set is responsible for a specific set of functions and communicates with other data sets through APIs. This approach allows for more flexibility, scalability, and faster development cycles in the data domain.
Data Domain core problems
So let's dive deep into the most significant problems of the Data Domain.
Single Source of Truth
The "single source of truth" (SSOT) is a concept in data management that refers to the idea that there should be a single, authoritative source for a particular piece of data within an organization. The problem with relying on a SSOT is that it can be difficult to ensure that the data is always accurate, up-to-date, and complete.
One challenge is that different departments or teams within an organization may have their own data sources and definitions, which can lead to inconsistencies and discrepancies. For example, one team may define a customer as anyone who has made a purchase, while another team may define a customer as anyone who has created an account. These differing definitions can result in conflicting data and terms.
Another challenge is that the process of maintaining a SSOT can be time-consuming and resource-intensive. It requires careful data governance, including data quality checks, data cleansing, and ongoing data stewardship. Data can become outdated or obsolete without appropriate governance, making the entire solution unreliable.
Traditional data warehouses advocate the need for a centralized repository for storing and managing data that can be accessed by different departments or teams within an organization.?
However, this approach works only with solid, well-established data governance in place that spreads throughout the entire organization cross divisionally shared across different business units and divisions.
In contrast, Data mesh endorses independent teams and data products that should collaborate in a loosely coupled manner, implementing together the entire enterprise-wide data dictionary and the single source of truth.?
DWH is slow and bureaucratic; Data Mesh creates Data silos.
Let's examine the different concerns.
DWH tries to consolidate and standardize all inputs, requirements, and business terms into One Central repository, which requires a lot of debate, agreement, and clarification. This process is inherently slow and cumbersome, amplifying the voices of critics about the very long delivery times.
On the other hand, Data Mesh offers faster delivery times due to the independence of product teams however it ignores the need for intense and exhausting collaboration among different teams to avoid data and business term duplication. For example, a product team may expose its data catalog to consumers, though this approach inevitably leads to good old data silos because its domain/product-specific focus and responsibility do not allow for broader collaboration. Otherwise, it would lead to the delivery problem of traditional DWH.
So all in all, Data Mesh sacrifices quality for a faster time-2-market.?
领英推荐
Development Lifecycle
DWH is slow, Data Mesh is fast
In a traditional Data Warehouse architecture, release cycles can be slow and infrequent, as changes to the data warehouse require significant coordination and testing. In contrast, a Data Mesh architecture emphasizes decentralized development and deployment, which can enable faster and more frequent releases. Data products can be developed and released independently, with domain experts responsible for testing and validating their own changes.
In a traditional Data Warehouse architecture, deploying changes to the data warehouse can be disruptive, as the entire system needs to be taken offline to implement changes. In contrast, a Data Mesh architecture enables individual data products to be deployed independently, which can minimize disruption and enable more flexible deployment options. In addition, changes to individual data products can be rolled out gradually, with testing and validation occurring in parallel.
Fault tolerance?
In a traditional Data Warehouse architecture, a failure in one part of the system can impact the entire system, leading to significant downtime and loss of productivity. In contrast, a Data Mesh architecture emphasizes decentralized development and deployment, which can enable more robust failure isolation. Individual data products can fail without impacting the entire system, as they are designed to operate independently.
Scalability
In a traditional Data Warehouse architecture, scaling the system can be challenging, as the entire system needs to be scaled as a single unit. In contrast, a Data Mesh architecture enables individual data products to be scaled independently, which can lead to more efficient use of resources and better performance. Individual data products can be scaled up or down based on their own needs, without impacting the performance of other data products.
Complexity
Data mesh involves a more complex technical infrastructure than centralized data products, as it requires the creation of self-service data platforms for each domain team. This can result in additional complexity, both in terms of infrastructure management and data governance.
In addition, implementing a data mesh approach requires significant organizational change, which can be challenging to manage. It may require changes to team structures, reporting lines, and governance processes, which can result in resistance from stakeholders and require significant investment in change management.
Cost
A data mesh approach can be more expensive to implement and maintain than a centralized data product like a data warehouse. This is due to the need for domain-specific data platforms, which can be costly to develop and maintain.?
In addition, overall software license costs may be significantly higher in the case of a Data Mesh architecture due to the need for dedicated infrastructure per data product though it is theoretically possible to group a few of them.
Data Access & Self-Service
In a traditional Data Warehouse architecture, data access can be centralized, with a single team responsible for managing and providing access to the data. As a result, this can be more efficient, with users needing to go through the central team to access the data they need. In contrast, a Data Mesh architecture emphasizes decentralized data access, with individual data products owned and managed by domain experts. As a result, users have to request access to multiple systems and learn multiple domains, including specific tools, customs, and separate processes.
In a traditional Data Warehouse architecture, self-service can be centralized in terms of processes, tools, and frameworks. In contrast, a Data Mesh architecture emphasizes self-service, with domain experts owning and managing their own data products however implementing an enterprise-wide self-service solution can be challenging and may lead to a significant cost expansion.
Final thoughts
After the immense success of the latest software implementation approaches like Microservice Architecture and Service Mesh,?
and after the revolution and rise of the Cloud, it was only a matter of time before the Data domain took these new trends over, rebranding them and offering them as a holy grail. Data Mesh is undoubtedly an exciting concept, suggesting and promoting features like scalability, faster delivery and self-service where the traditional data warehouse is struggling. However, it reminds me of a debate from the old 90s. A few decades ago, the Data Warehouse concept was new and uncommon. Instead, there were independent teams, and all of them extracted more or less the same data from the same source systems in similar ways using comparable tools and techniques. These teams were the infamous data silos. There were literally no two departments that could give an exact answer to the same question. Their counter-argument was always needing faster delivery, domain-specific KPIs, and a small and effective team. Sounds familiar? The board members went mad when they were unable to get one answer to simple questions like how many active customers do we have or what is the average profit per customer segment. Finally, we came to the conclusion that we needed something standardized, consolidated, and cleansed. The result of this cumbersome, tedious, and exhausting process was the data warehouse. Data Mesh warms the old debate up if faster delivery is more important than quality or standardized and consolidated Business data terms. At the same time, it overlooks some common problems like shared domains (aka Master Data Management) of product, customer, organization, location, etc., the data integration and latency issues among different data domains (connecting data products via API-s seriously?) or the extra cost of maintaining multiple redundant teams and solutions.
If you are a leader in Data Domain, I would be at least as careful with this new trend as I was with the deservedly infamous data lake & Co.
Senior AI & Data Strategy Manager | Deloitte Consulting
1 年Definitely super hot in the industry????To dive even deeper on?#DataMesh,?#DataFabric?and?#HybridDataManagementcheck out the full day Digital Leaders Meetup 2.0 in Berlin on July 13th with industry leaders from Zalando, Freenow, Baywa R.E., Uniper, Deutsche Bahn, Symrise, Signa Sports and further. --> Event Flyer: https://bit.ly/DigLeaders2Flyer? ?? Participate in engaging think-tanks and a panel discussion. Gain actionable insights for #DataDriven transformation and connect to your peers in our network dinner after ?? Ping me or Sebastian Kunert if you're interested in joining! #DigitalLeadersMeetup
Founder of Enrol - We simply develop
1 年Zoli, congratulations on your article. It is rare to read such a realistic approach to data management. The biggest problem is that everyone thinks they are right. It is also a matter of faith.