登录查看更多内容

Come Hell or High Water: Some Lessons from Four Years of Data Mesh Implementations Learned the Hard Way: Lesson One

Andrei Zaichikov

Director, Enterprise Technology Strategy, EMEA at Pure Storage

发布日期: 2023年12月1日

Disclaimer: This article represents my personal experience and opinion. It does not represent the official position of my current employer. All links and materials are from public sources only.

LONG READ WARNING: What initially started as a short recap of my keynote presentation at CDAO Europe in Amsterdam has turned into this long read (and it is Part 1 still). You might need some additional equipment before reading this article. May the force be with you!

For a couple of years, Data Mesh has been conquering the hearts and minds of data professionals and decision-makers around the world. It has resonated particularly well with some of the largest enterprises, prompting numerous companies over the past few years to attempt implementing the Data Mesh concept to address their unique challenges.

There have been both successes and failures along the way, and, to the best of my knowledge, very few companies have managed to fully embrace Data Mesh.

However, interest in Data Mesh remains high. At Big Data London 2023 alone, there were over 30 talks dedicated to subjects directly connected to Data Mesh.

In this short article, I will share some of the lessons learned during the last four years of participating in various Data Mesh projects.

But before we begin to learn the truth, let's go through a short introduction to Data Mesh—understanding the main drivers for this methodology's establishment, its promises, and the key conflicts of the Mesh, among other aspects.

Introduction into Data Mesh

As we know, the term Data Mesh was first introduced by Zhamak Dehghani in 2019 when she was working as a Principal Consultant at the technology company ThoughtWorks [1] . In her original paper, four major principles of Data Mesh were introduced. They are:

-????????????? Domain Ownership.

-????????????? Data as a Product.

-????????????? Self-Service Data Platform.

-????????????? Federated Computational Governance.

Spoiler alert: There are ongoing debates about the definitions for each of these topics.

Wikipedia describes Data Mesh as a sociotechnical approach to building a decentralized data architecture.

But... hold on. Why do we need "a sociotechnical approach to building a decentralized data architecture" in the first place?

In short, businesses require much faster access to analytical data.

Blaming the Data: A Brief History of the Motivation Behind Data Mesh

The rise of Data Science, Advanced Analytics, Machine Learning—and, of course, Mathematical Statistics and other areas of Mathematics—has finally allowed businesses to derive value from data by experimenting with the knowns and unknowns.

The problem, as always, comes from the data itself rather than a lack of analytics tools or methodologies. Historically, the data landscapes of large organizations have evolved by incorporating new technologies into their existing portfolios. Older systems and technologies were rarely replaced or refactored. Consequently, after decades of evolution, large enterprises typically possess four to six (or even more) layers of interconnected systems. These are often linked through a myriad of transformation technologies, applications, and techniques.

The reason systems and tools were added during evolution was not solely due to fashion (although fashion does play a significant role in the IT industry). Factors like business expansion, diversification of service portfolios, changes in regulatory frameworks, geographical growth, mergers, and acquisitions all contributed to the integration of new systems and technologies. The core reason for this evolution is that each Line of Business (LOB) perceives its business model, customer base, products, and services uniquely. Consequently, they require distinct semantics for their underlying data models and diverse types and parameters of processing. This necessitates the use of various technologies to meet operational boundaries (such as performance, reliability) and budget constraints. Thus, many enterprises that started their scale-out phase with a mainframe have ended up managing hundreds of different technologies.

A major conflict arises because LOBs want control over transactions as they directly reflect business operations. Managing their own transactions also reduces dependence on other company units. In contrast, Analytics aims to leverage as much enterprise data as possible. Over time, data for analytics has become almost a currency within organizations. LOBs are reluctant to share their data with centralized teams, even if these teams assume responsibility for the most complex and budget-heavy tasks, as it would mean fewer resources for the LOBs. This reluctance is one of the main reasons why the Datawarehouse concept was never fully implemented on a global enterprise scale, at least in my experience (although some of the Datawarehouses out there are quite enormous).

Compounding the issue, many companies lacked a reliable source of information about their data storage, processing, and usage. Manual data cataloging and documentation, which is prone to human error due to its repetitive nature, was the norm.

After the failure of the data warehouse concept, which had gained popularity in the early 2000s, we are now in the next iteration, and the need for advanced analytics is higher than ever.

The bottom line is that businesses need data, and they need it FAST.

Come Hell or High Water: An Introduction to the Promise of Data Mesh

In reality, ?we can trace the early roots of an approach that would later be described and named Data Mesh back to 2018, and possibly even earlier. Several companies, including Microsoft [2] and ABN Amro [3] , embarked on journeys to make data more accessible to a broader user base within their organizations.

These approaches combined technology aspects, such as Data Self-Service and, to a certain extent, Federated Governance, with organizational processes and cultural changes. It quickly became evident that it needed to be a structured approach to control data ownership. The concept of Data Domains was not clearly defined until Data Mesh articulated it.

Similarly, in a distributed Self-Service ecosystem (ideally), Data Governance required a novel approach to describe metadata, or data about data. The challenge was that such metadata, when applied to Federated Governance and Self-Service, necessitated processes for managing metadata sharing, control over changes and change management, access management, among other considerations. To address this issue, the concept of Data Products was introduced.

Early implementations of Data Mesh often resemble this framework.

Lines of Business handle transactions within Operational Data Stores (ODS).
A Data Producer owns a Data Product within a Data Domain. This organization is responsible for extracting data from the ODS (either directly or via an intermediary process) and ingesting it into the Data Product. It also serves this data in accordance with the Data Product Contract and adheres to the rules and processes of the Federated Governance system.
A Data Consumer uses one of the methods provided in the Data Product Contract to utilize the data for their own purposes.

Sounds simple enough, right? It also seems that we already have almost everything we need to build such a platform within organizations using the existing technology stack and advanced analytics capabilities.

KX 3 个月前

This new data role is making waves

5X 11 个月前

One of the best Big Data Consultancy and Service…

Plain Concepts 1 年前

Given the perceived technical simplicity of its implementation, the promise of Data Mesh was highly appealing:

To solve Data Ownership issues using a combination of Data Domains and Data Products.
To decrease Time-To-Data through simplified processes (like Self-Service) and unified methods for data consumption encapsulated within Data Products.
To enhance Data Discoverability and centralize and unify (to a certain extent) Data Governance and controls through a Federated Governance model.

Reading between the lines, a few other promises become apparent:

Utilizing Data Products could gradually eliminate redundancies in data and processing.
Knowing the owner of a Data Product could simplify change management and make it easier to resolve dependencies.

A significant cleanup, though unavoidable, could lead to improved data quality as a byproduct of transitioning to Mesh.

These are bold promises, especially considering that our industry has struggled with these issues for decades. However, the demand from businesses was clear, and many technological aspects were nearly ready for Data Mesh. Consequently, numerous companies embarked on this journey.

Lesson One: Responsibility of Product Owners

A fundamental conflict within Data Mesh lies in balancing centralization—necessary for governance and addressing redundancies and semantic conflicts—with decentralization, which emphasizes domain/product ownership and self-service.

The first instance where this conflict emerges and requires resolution is in the roles and responsibilities of data producers and Data Product Owners. Considering the entire setup, around 90% of data engineering responsibilities fall to the Product Owner or Data Producer (assuming they represent different entities). On the other hand, the Data Consumer has minimal responsibilities, almost negligible in comparison. However, the problem arises from the Data Consumer's critical dependence on the Data Product, including its service level agreements (SLAs), quality, and other aspects. Therefore, the Data Consumer needs firm commitments from the Data Product Owner, who is tasked with not only building but also maintaining, evolving, and resolving issues with the Data Product.

Potential owners of Data Products, recognizing these challenges, are often reluctant to commit to maintaining them. Moreover, within the current organizational structures, they might negotiate concessions from Data Consumers or the broader enterprise to ensure the provision of information with the required quality and SLAs.

There are few workarounds for this problem.

Top-Down Enforcement

Data Producers can be compelled to create and maintain Data Products through top-down enforcement. This directive might originate from upper management, especially if the Data Mesh concept has been endorsed by the board or executives, or it could be implemented as an organization-wide policy. However, this approach often proves ineffective due to the high complexity of the problem and the myriad opportunities for internal sabotage. Furthermore, it demands considerable effort in terms of management and control and tends to foster an unhealthy environment for conflict resolution and other critical tasks, such as dealing with redundancies and establishing governance.

One notable downside of this method is the inevitable political cost for those enforcing it. Implementing such measures on a large scale invariably leads to significant consequences, consuming the "political capital" of those in charge. Inevitably, these individuals either exhaust their influence, as few problems get effectively resolved in this manner, or they are compelled to lessen the enforcement pressure, resulting in a transformation process that fails to reach a logical and satisfactory conclusion.

Culture Shift

At the core of enterprises and other businesses are people, with most activities executed by and for them. In certain organizations, the prevailing culture fosters the sharing of data based on a general consensus that it should be openly available within the company. In such environments, treating data contracts similarly to open-source software can establish a moral obligation in the owners of Data Products to maintain data quality and adhere to SLAs.

This approach is often effective when the owners of Data Products are also users of these products, or when the organization critically relies on cross-LOB (Line of Business) exchange, leaving no room to withhold or obscure data from other exchange participants.

Open and equal data sharing within a company also necessitates addressing conflicts in the semantics of data entities among different competing parties. It's important to note that resolving these conflicts relies on the goodwill of Data Product owners.

In some instances, it is feasible to gradually shift the existing culture to enable such exchanges within the organization (or at least within certain segments of it). This process requires consistent effort over time and is typically challenging to implement. A combination of champions for enablement, communities, automation, and example-led scenarios can yield positive results.

This culture of open data exchange is more commonly found in smaller organizations where LOBs are interdependently crucial or in specific segments of larger enterprises. In such cases, some LOBs might freely exchange information using centralized or unified platforms, while others may operate in isolation.

Reselling Technical Debt

Many organizations exhibit a reluctance to openly exchange data and assume extensive responsibilities for their Data Products. In such environments, Data Consumers or occasionally a centralized authority might opt to allocate a portion of their budget to build and maintain certain Data Products. Consequently, this can incentivize Data Product owners to develop and share their data as Data Products, seeking additional funding to address existing technical debt or resource shortages for system or product maintenance.

However, this approach has several major downsides:

It triggers a redistribution of budgets within organizations, potentially leading to intense competition and conflicts. This reallocation may also create imbalances and disproportionately focus resources on managing technical debt.
Some Data Product owners might gain excessive power, evolving into centralized data platforms. This centralization can lead to a situation where most dependencies are concentrated in a few platforms, ironically reverting back to the initial centralized state Data Mesh aims to move away from.
Not all Data Product Owners are willing to adopt this approach, deterred by the complexity of such projects or other factors.

Despite these challenges, reselling technical debt is an approach several companies are exploring as part of their transition to Data Mesh.

Data Economics / Incentivizing Data Producers

The most direct and seemingly straightforward approach to motivating Data Product Owners and Data Producers is to incentivize them for the usage of Data Products. Essentially, this involves creating a system of Data Economics within an enterprise. Considering that many organizations already purchase data from external providers, why not implement a similar model internally?

However, while the concept might sound simple, it actually necessitates significant organizational transformation and a reevaluation of how different Lines of Business (LOBs) collaborate. A major challenge is that each organization must navigate this transformation uniquely, as there is limited theoretical guidance and practical precedent.

Despite the complexity, some enterprises are attempting to define and pilot such a framework on a limited scale. This approach appears promising, as it not only addresses questions of motivation but also fosters innovation and encourages optimization among Data Product Owners.

Lesson One: Conclusion

Addressing the motivation of Data Product Owners to create, clean, and maintain Data Products with specified contractual and operational parameters is a paramount priority in Data Mesh implementation projects. Currently, there are four primary strategies to achieve this, with Cultural Change and Reselling Technical Debt being the most prevalent.

To effectively implement Data Mesh at any scale and in organizations of varying complexity, a novel approach to Data Economics is required. This approach should enable Data Product Owners to receive appropriate incentives for the Data Products they provide within the organization.

Thank you for reading until the end. Hope you have enjoyed it and please feel free to like or reach out directly, so I know there is a need to continue this series. Thank you!

Lianne Hartley

Independent Consultant | Data Enthusiast | CDO Summer School 2023

11 个月

I love it when I can mentally shout out "Yes, Yes, Yes" as I'm reading an article! Really fun to read and well explained, thank you for taking the time to write this. I look forward to your next in the series!

1 次回应

Oleg Tikhonov

Head of Data @ Axellect KZ | Driving Data Strategy and Governance

11 个月

Are there any examples of data economy implementation? Very interested

1 次回应

Scott Mckinnon

Cloud Solution Architect (CSA) - CSU Data and AI Team | Microsoft

11 个月

"..while the concept might sound simple, it actually necessitates significant organizational transformation and a reevaluation of how different Lines of Business (LOBs) collaborate" #truthbomb - great share Andrei Zaichikov

1 次回应

Jurijs Fjodorovs

Enterprise Data Strategy Lead | Chief Data Office

11 个月

Agree, and company size can dictate the need for a combination of approaches, not just one or another. All approaches may be necessary for success. Looks like we share the same experience and understanding of the problem. Eagerly awaiting the next part of the blog to check my beliefs.

1 次回应

查看更多评论

要查看或添加评论，请登录

Andrei Zaichikov的更多文章

Time to Data: Measuring and Using for Continuous Improvement

2024年1月10日

Time to Data: Measuring and Using for Continuous Improvement

What is Time to Data (TTD) Time to Data shows how much time end user requires to start using data asset in their…

9 条评论
Major Technology Trends Unravelled during Big Data London 2023

2023年9月30日

Major Technology Trends Unravelled during Big Data London 2023

That's a wrap for Big Data London! One of the niciest things about these events is that one can easily check what the…

11 条评论
Building Interactive Enterprise Grade Applications with Open AI and Microsoft Azure

2023年5月17日

Building Interactive Enterprise Grade Applications with Open AI and Microsoft Azure

Disclaimer. This is an opinion of the authors, and it does not necessarily reflect the recommendations or point of view…

13 条评论
Unbiased view of bringing Synapse Analytics and Azure Databricks together

2023年4月21日

Unbiased view of bringing Synapse Analytics and Azure Databricks together

About a year ago, we created this article to provide an unbiased view on when and how to use Azure Synapse and Azure…

10 条评论
Short Note on Custom Tokenization for simple FSI use-cases

2023年3月1日

Short Note on Custom Tokenization for simple FSI use-cases

A while ago there was a question from one of our FSI customers regarding the way custom tokenization may look like…
Lightweight Implementation of Self-Service Data Sharing Platform on Azure

2022年11月18日

Lightweight Implementation of Self-Service Data Sharing Platform on Azure

In vast number of companies obtaining access to the dataset requires dozens of emails, meetings, and tons of…

19 条评论
Deleting Sensitive Data in the Data Lake (and beyond)

2022年7月15日

Deleting Sensitive Data in the Data Lake (and beyond)

Disclaimer. All the opinions and recommendations are my own.

9 条评论
Some Notes on Data Lake Zoning

2022年5月17日

Some Notes on Data Lake Zoning

Before we begin, please note that all written below reflects personal opinion and experience and doesn’t represent…

14 条评论
Landing Oracle DB on Azure: Where? How?

2022年1月20日

Landing Oracle DB on Azure: Where? How?

This is one of the top questions we have been asked for the last few years. And there are tons of artifacts answering…

5 条评论
Azure DataBox and Soft Skills – Practical Notes on using Azure DataBox and Similar Solutions

2021年8月12日

Azure DataBox and Soft Skills – Practical Notes on using Azure DataBox and Similar Solutions

In the short article below I would like to share some practical experience with Azure DataBox and similar solutions…

1 条评论

See all articles

Come Hell or High Water: Some Lessons from Four Years of Data Mesh Implementations Learned the Hard Way: Lesson One

Andrei Zaichikov

Director, Enterprise Technology Strategy, EMEA at Pure Storage

Introduction into Data Mesh

Blaming the Data: A Brief History of the Motivation Behind Data Mesh

Come Hell or High Water: An Introduction to the Promise of Data Mesh

领英推荐

Lesson One: Responsibility of Product Owners

Top-Down Enforcement

Culture Shift

Reselling Technical Debt

Data Economics / Incentivizing Data Producers

Lesson One: Conclusion

Andrei Zaichikov的更多文章

社区洞察

其他会员也浏览了

DDL Ep 04: Data Mesh Journeys — Lessons from the Field

Forrester changed the way they think about data catalogs. Here’s what you need to know.

Why 2022 Will Be the Year of Data Observability

Data Mesh, Data as a Product, and Active Metadata

Buckle up for Big Data

Bridge strategy gap for big data adoption

Analytics and Data Science News for the Week of September 20; Updates from Firebolt, Qrvey, Teradata & More

Semantic Layer Round-Up

Level up: day two @ Snowflake Data Cloud Summit 2024!

DATA Pill #020 - The Rise of DataOps and The Power of MLOps

Introduction into Data Mesh

Blaming the Data: A Brief History of the Motivation Behind Data Mesh

Come Hell or High Water: An Introduction to the Promise of Data Mesh

领英推荐

Lesson One: Responsibility of Product Owners

Top-Down Enforcement

Culture Shift

Reselling Technical Debt

Data Economics / Incentivizing Data Producers

Lesson One: Conclusion

Andrei Zaichikov的更多文章

Time to Data: Measuring and Using for Continuous Improvement

Major Technology Trends Unravelled during Big Data London 2023

Building Interactive Enterprise Grade Applications with Open AI and Microsoft Azure

Unbiased view of bringing Synapse Analytics and Azure Databricks together

Short Note on Custom Tokenization for simple FSI use-cases

Lightweight Implementation of Self-Service Data Sharing Platform on Azure

Deleting Sensitive Data in the Data Lake (and beyond)

Some Notes on Data Lake Zoning

Landing Oracle DB on Azure: Where? How?

Azure DataBox and Soft Skills – Practical Notes on using Azure DataBox and Similar Solutions

社区洞察

其他会员也浏览了

DDL Ep 04: Data Mesh Journeys — Lessons from the Field

Forrester changed the way they think about data catalogs. Here’s what you need to know.

Why 2022 Will Be the Year of Data Observability

Data Mesh, Data as a Product, and Active Metadata

Buckle up for Big Data

Bridge strategy gap for big data adoption

Analytics and Data Science News for the Week of September 20; Updates from Firebolt, Qrvey, Teradata & More

Semantic Layer Round-Up

Level up: day two @ Snowflake Data Cloud Summit 2024!

DATA Pill #020 - The Rise of DataOps and The Power of MLOps