登录查看更多内容

Data Mesh: The Dark Side Of The New Data Hype

Zoltan Horkay

Cloud Migration Enabler ?? App & Data Modernization Expert ?? Oracle Elimination Wizard

发布日期: 2023年4月17日

Data mesh and microservices architecture are two concepts that share many similarities. One could argue that data mesh is a replicated concept of microservices architecture in the data domain. In this article, we'll explore the similarities between these concepts, examine the answers of Data Mesh to the old problems of the data domain, and compare those with the traditional approach of a Data Warehouse.

What is Microservices architecture?

Microservices architecture is a software development approach that emphasizes breaking down complex applications into smaller, independent services that can be developed, deployed and scaled independently. Each microservice is responsible for a specific set of functions and communicates with other microservices through APIs (Application Programming Interfaces).

It allows for more flexibility, scalability, and faster development cycles because smaller teams can work on individual microservices without impacting other parts of the system.

What is Data Mesh?

The tremendous success of microservice-based architecture revolutionized traditional application development and generated many debates in the data domain if this concept could be applied and replicated to data products like data warehouses. And Data Mesh was born. Data mesh is an approach to data management that emphasizes breaking down data into smaller, domain-specific data sets that can be managed and governed by individual domain teams. Each data set is responsible for a specific set of functions and communicates with other data sets through APIs. This approach allows for more flexibility, scalability, and faster development cycles in the data domain.

Data Domain core problems

So let's dive deep into the most significant problems of the Data Domain.

Single Source of Truth

The "single source of truth" (SSOT) is a concept in data management that refers to the idea that there should be a single, authoritative source for a particular piece of data within an organization. The problem with relying on a SSOT is that it can be difficult to ensure that the data is always accurate, up-to-date, and complete.

One challenge is that different departments or teams within an organization may have their own data sources and definitions, which can lead to inconsistencies and discrepancies. For example, one team may define a customer as anyone who has made a purchase, while another team may define a customer as anyone who has created an account. These differing definitions can result in conflicting data and terms.

Another challenge is that the process of maintaining a SSOT can be time-consuming and resource-intensive. It requires careful data governance, including data quality checks, data cleansing, and ongoing data stewardship. Data can become outdated or obsolete without appropriate governance, making the entire solution unreliable.

Traditional data warehouses advocate the need for a centralized repository for storing and managing data that can be accessed by different departments or teams within an organization.?

However, this approach works only with solid, well-established data governance in place that spreads throughout the entire organization cross divisionally shared across different business units and divisions.

In contrast, Data mesh endorses independent teams and data products that should collaborate in a loosely coupled manner, implementing together the entire enterprise-wide data dictionary and the single source of truth.?

DWH is slow and bureaucratic; Data Mesh creates Data silos.

Let's examine the different concerns.

DWH tries to consolidate and standardize all inputs, requirements, and business terms into One Central repository, which requires a lot of debate, agreement, and clarification. This process is inherently slow and cumbersome, amplifying the voices of critics about the very long delivery times.

On the other hand, Data Mesh offers faster delivery times due to the independence of product teams however it ignores the need for intense and exhausting collaboration among different teams to avoid data and business term duplication. For example, a product team may expose its data catalog to consumers, though this approach inevitably leads to good old data silos because its domain/product-specific focus and responsibility do not allow for broader collaboration. Otherwise, it would lead to the delivery problem of traditional DWH.

So all in all, Data Mesh sacrifices quality for a faster time-2-market.?

领英推荐

Extended knowledge ELT/ETL

Data & Analytics 1 年前

Data Lakehouse Architecture: A Modern Solution for…

Andrew C. Madson 4 个月前

Modern Data Architecture Concepts

Irfan Azim Saherwardi 1 年前

Development Lifecycle

DWH is slow, Data Mesh is fast

In a traditional Data Warehouse architecture, release cycles can be slow and infrequent, as changes to the data warehouse require significant coordination and testing. In contrast, a Data Mesh architecture emphasizes decentralized development and deployment, which can enable faster and more frequent releases. Data products can be developed and released independently, with domain experts responsible for testing and validating their own changes.

In a traditional Data Warehouse architecture, deploying changes to the data warehouse can be disruptive, as the entire system needs to be taken offline to implement changes. In contrast, a Data Mesh architecture enables individual data products to be deployed independently, which can minimize disruption and enable more flexible deployment options. In addition, changes to individual data products can be rolled out gradually, with testing and validation occurring in parallel.

Fault tolerance?

In a traditional Data Warehouse architecture, a failure in one part of the system can impact the entire system, leading to significant downtime and loss of productivity. In contrast, a Data Mesh architecture emphasizes decentralized development and deployment, which can enable more robust failure isolation. Individual data products can fail without impacting the entire system, as they are designed to operate independently.

Scalability

In a traditional Data Warehouse architecture, scaling the system can be challenging, as the entire system needs to be scaled as a single unit. In contrast, a Data Mesh architecture enables individual data products to be scaled independently, which can lead to more efficient use of resources and better performance. Individual data products can be scaled up or down based on their own needs, without impacting the performance of other data products.

Complexity

Data mesh involves a more complex technical infrastructure than centralized data products, as it requires the creation of self-service data platforms for each domain team. This can result in additional complexity, both in terms of infrastructure management and data governance.

In addition, implementing a data mesh approach requires significant organizational change, which can be challenging to manage. It may require changes to team structures, reporting lines, and governance processes, which can result in resistance from stakeholders and require significant investment in change management.

Cost

A data mesh approach can be more expensive to implement and maintain than a centralized data product like a data warehouse. This is due to the need for domain-specific data platforms, which can be costly to develop and maintain.?

In addition, overall software license costs may be significantly higher in the case of a Data Mesh architecture due to the need for dedicated infrastructure per data product though it is theoretically possible to group a few of them.

Data Access & Self-Service

In a traditional Data Warehouse architecture, data access can be centralized, with a single team responsible for managing and providing access to the data. As a result, this can be more efficient, with users needing to go through the central team to access the data they need. In contrast, a Data Mesh architecture emphasizes decentralized data access, with individual data products owned and managed by domain experts. As a result, users have to request access to multiple systems and learn multiple domains, including specific tools, customs, and separate processes.

In a traditional Data Warehouse architecture, self-service can be centralized in terms of processes, tools, and frameworks. In contrast, a Data Mesh architecture emphasizes self-service, with domain experts owning and managing their own data products however implementing an enterprise-wide self-service solution can be challenging and may lead to a significant cost expansion.

Final thoughts

After the immense success of the latest software implementation approaches like Microservice Architecture and Service Mesh,?

and after the revolution and rise of the Cloud, it was only a matter of time before the Data domain took these new trends over, rebranding them and offering them as a holy grail. Data Mesh is undoubtedly an exciting concept, suggesting and promoting features like scalability, faster delivery and self-service where the traditional data warehouse is struggling. However, it reminds me of a debate from the old 90s. A few decades ago, the Data Warehouse concept was new and uncommon. Instead, there were independent teams, and all of them extracted more or less the same data from the same source systems in similar ways using comparable tools and techniques. These teams were the infamous data silos. There were literally no two departments that could give an exact answer to the same question. Their counter-argument was always needing faster delivery, domain-specific KPIs, and a small and effective team. Sounds familiar? The board members went mad when they were unable to get one answer to simple questions like how many active customers do we have or what is the average profit per customer segment. Finally, we came to the conclusion that we needed something standardized, consolidated, and cleansed. The result of this cumbersome, tedious, and exhausting process was the data warehouse. Data Mesh warms the old debate up if faster delivery is more important than quality or standardized and consolidated Business data terms. At the same time, it overlooks some common problems like shared domains (aka Master Data Management) of product, customer, organization, location, etc., the data integration and latency issues among different data domains (connecting data products via API-s seriously?) or the extra cost of maintaining multiple redundant teams and solutions.

If you are a leader in Data Domain, I would be at least as careful with this new trend as I was with the deservedly infamous data lake & Co.

Lukas Feuerstein

Senior AI & Data Strategy Manager | Deloitte Consulting

1 年

Definitely super hot in the industry????To dive even deeper on?#DataMesh,?#DataFabric?and?#HybridDataManagementcheck out the full day Digital Leaders Meetup 2.0 in Berlin on July 13th with industry leaders from Zalando, Freenow, Baywa R.E., Uniper, Deutsche Bahn, Symrise, Signa Sports and further. --> Event Flyer: https://bit.ly/DigLeaders2Flyer? ?? Participate in engaging think-tanks and a panel discussion. Gain actionable insights for #DataDriven transformation and connect to your peers in our network dinner after ?? Ping me or Sebastian Kunert if you're interested in joining! #DigitalLeadersMeetup

David Pataki

Founder of Enrol - We simply develop

1 年

Zoli, congratulations on your article. It is rare to read such a realistic approach to data management. The biggest problem is that everyone thinks they are right. It is also a matter of faith.

1 次回应

查看更多评论

要查看或添加评论，请登录

Zoltan Horkay的更多文章

Oracle to PostgreSQL migration challenges - The Language Differences

2023年7月17日

Oracle to PostgreSQL migration challenges - The Language Differences

Migrating from Oracle to PostgreSQL can be highly appealing for management because it allows for application…
Empowering Business Users with Self-service KPI Calculations: The Metadata Way

2023年4月19日

Empowering Business Users with Self-service KPI Calculations: The Metadata Way

In my previous article, I compared Data Mesh and Data Warehouse architectures. Data Mesh advocates and enables Self…
Maximizing the ROI of Cloud Migration: Tips and Tricks

2023年4月13日

Maximizing the ROI of Cloud Migration: Tips and Tricks

Cloud computing has several benefits over on-premise computing, which is why several businesses are moving to the…
7 surprisingly easy performance boosters make your legacy apps fly

2023年3月6日

7 surprisingly easy performance boosters make your legacy apps fly

Everyone dreams about scalable, state-of-the-art, event-driven microservice-based architecture. However, the cold…
Leveraging the amazing power of the awesome PostgreSQL Community

2023年2月25日

Leveraging the amazing power of the awesome PostgreSQL Community

This article is part of a series on Cloud journey specialized on Financial Institutions. Feedback is very welcome.
Free yourself from Oracle

2023年2月2日

Free yourself from Oracle

This article is part of a series on Cloud journey specialized on Financial Institutions. Feedback is very welcome.

2 条评论
AI will replace you! Or will it?

2023年1月23日

AI will replace you! Or will it?

With the latest wave of Artificial Intelligence, many friends reached out to me with feelings ranging between profound…
Road to the Cloud

2023年1月19日

Road to the Cloud

This article is part of a series on the Cloud journey specializing in Financial Institutions Feedback is very welcome…

1 条评论
Cloud vs. on Premise for Data Warehouse

2021年11月27日

Cloud vs. on Premise for Data Warehouse

The global cloud computing market was approximately $ 371.4 BILLION in 2020, which will double by 2024.

1 条评论

See all articles

Data Mesh: The Dark Side Of The New Data Hype

Zoltan Horkay

Cloud Migration Enabler ?? App & Data Modernization Expert ?? Oracle Elimination Wizard

What is Microservices architecture?

What is Data Mesh?

Data Domain core problems

Single Source of Truth

领英推荐

Development Lifecycle

Fault tolerance?

Scalability

Complexity

Cost

Data Access & Self-Service

Final thoughts

Zoltan Horkay的更多文章

社区洞察

其他会员也浏览了

Data Management in Microservices Architecture: Leveraging CTEs and Cascading Deletes

The benefits of GraphQL API Architecture: A modern solution for efficient data management

Medallion Architecture framework within the Microsoft Fabric (Bronze Layer) - Part 1

Zero ETL in Data Mesh Architecture: The Revolution in Data Engineering

The Databricks Data Lakehouse

What is Data Pipeline Architecture?

Data Mesh: The Four Principles of a Distributed Architecture : Source : Originally published at https://eleks.com on March 11, 2021.

Master Your Data: Build a Metadata-Driven Architecture

Data Virtualization: Strategies for a 'Zero ETL' Future

Building a Scalable Data Lake Architecture

What is Microservices architecture?

What is Data Mesh?

Data Domain core problems

Single Source of Truth

领英推荐

Development Lifecycle

Fault tolerance?

Scalability

Complexity

Cost

Data Access & Self-Service

Final thoughts

Zoltan Horkay的更多文章

Oracle to PostgreSQL migration challenges - The Language Differences

Empowering Business Users with Self-service KPI Calculations: The Metadata Way

Maximizing the ROI of Cloud Migration: Tips and Tricks

7 surprisingly easy performance boosters make your legacy apps fly

Leveraging the amazing power of the awesome PostgreSQL Community

Free yourself from Oracle

AI will replace you! Or will it?

Road to the Cloud

Cloud vs. on Premise for Data Warehouse

社区洞察

其他会员也浏览了

Data Management in Microservices Architecture: Leveraging CTEs and Cascading Deletes

The benefits of GraphQL API Architecture: A modern solution for efficient data management

Medallion Architecture framework within the Microsoft Fabric (Bronze Layer) - Part 1

Zero ETL in Data Mesh Architecture: The Revolution in Data Engineering

The Databricks Data Lakehouse

What is Data Pipeline Architecture?

Data Mesh: The Four Principles of a Distributed Architecture : Source : Originally published at https://eleks.com on March 11, 2021.

Master Your Data: Build a Metadata-Driven Architecture

Data Virtualization: Strategies for a 'Zero ETL' Future

Building a Scalable Data Lake Architecture