Come Hell or High Water: Some Lessons from Four Years of Data Mesh Implementations Learned the Hard Way: Lesson One
Andrei Zaichikov
Director, Enterprise Technology Strategy, EMEA at Pure Storage
Disclaimer: This article represents my personal experience and opinion. It does not represent the official position of my current employer. All links and materials are from public sources only.
LONG READ WARNING: What initially started as a short recap of my keynote presentation at CDAO Europe in Amsterdam has turned into this long read (and it is Part 1 still). You might need some additional equipment before reading this article. May the force be with you!
For a couple of years, Data Mesh has been conquering the hearts and minds of data professionals and decision-makers around the world. It has resonated particularly well with some of the largest enterprises, prompting numerous companies over the past few years to attempt implementing the Data Mesh concept to address their unique challenges.
There have been both successes and failures along the way, and, to the best of my knowledge, very few companies have managed to fully embrace Data Mesh.
However, interest in Data Mesh remains high. At Big Data London 2023 alone, there were over 30 talks dedicated to subjects directly connected to Data Mesh.
In this short article, I will share some of the lessons learned during the last four years of participating in various Data Mesh projects.
But before we begin to learn the truth, let's go through a short introduction to Data Mesh—understanding the main drivers for this methodology's establishment, its promises, and the key conflicts of the Mesh, among other aspects.
Introduction into Data Mesh
As we know, the term Data Mesh was first introduced by Zhamak Dehghani in 2019 when she was working as a Principal Consultant at the technology company ThoughtWorks [1] . In her original paper, four major principles of Data Mesh were introduced. They are:
-????????????? Domain Ownership.
-????????????? Data as a Product.
-????????????? Self-Service Data Platform.
-????????????? Federated Computational Governance.
Spoiler alert: There are ongoing debates about the definitions for each of these topics.
Wikipedia describes Data Mesh as a sociotechnical approach to building a decentralized data architecture.
But... hold on. Why do we need "a sociotechnical approach to building a decentralized data architecture" in the first place?
In short, businesses require much faster access to analytical data.
Blaming the Data: A Brief History of the Motivation Behind Data Mesh
The rise of Data Science, Advanced Analytics, Machine Learning—and, of course, Mathematical Statistics and other areas of Mathematics—has finally allowed businesses to derive value from data by experimenting with the knowns and unknowns.
The problem, as always, comes from the data itself rather than a lack of analytics tools or methodologies. Historically, the data landscapes of large organizations have evolved by incorporating new technologies into their existing portfolios. Older systems and technologies were rarely replaced or refactored. Consequently, after decades of evolution, large enterprises typically possess four to six (or even more) layers of interconnected systems. These are often linked through a myriad of transformation technologies, applications, and techniques.
The reason systems and tools were added during evolution was not solely due to fashion (although fashion does play a significant role in the IT industry). Factors like business expansion, diversification of service portfolios, changes in regulatory frameworks, geographical growth, mergers, and acquisitions all contributed to the integration of new systems and technologies. The core reason for this evolution is that each Line of Business (LOB) perceives its business model, customer base, products, and services uniquely. Consequently, they require distinct semantics for their underlying data models and diverse types and parameters of processing. This necessitates the use of various technologies to meet operational boundaries (such as performance, reliability) and budget constraints. Thus, many enterprises that started their scale-out phase with a mainframe have ended up managing hundreds of different technologies.
A major conflict arises because LOBs want control over transactions as they directly reflect business operations. Managing their own transactions also reduces dependence on other company units. In contrast, Analytics aims to leverage as much enterprise data as possible. Over time, data for analytics has become almost a currency within organizations. LOBs are reluctant to share their data with centralized teams, even if these teams assume responsibility for the most complex and budget-heavy tasks, as it would mean fewer resources for the LOBs. This reluctance is one of the main reasons why the Datawarehouse concept was never fully implemented on a global enterprise scale, at least in my experience (although some of the Datawarehouses out there are quite enormous).
Compounding the issue, many companies lacked a reliable source of information about their data storage, processing, and usage. Manual data cataloging and documentation, which is prone to human error due to its repetitive nature, was the norm.
After the failure of the data warehouse concept, which had gained popularity in the early 2000s, we are now in the next iteration, and the need for advanced analytics is higher than ever.
The bottom line is that businesses need data, and they need it FAST.
Come Hell or High Water: An Introduction to the Promise of Data Mesh
In reality, ?we can trace the early roots of an approach that would later be described and named Data Mesh back to 2018, and possibly even earlier. Several companies, including Microsoft [2] and ABN Amro [3] , embarked on journeys to make data more accessible to a broader user base within their organizations.
These approaches combined technology aspects, such as Data Self-Service and, to a certain extent, Federated Governance, with organizational processes and cultural changes. It quickly became evident that it needed to be a structured approach to control data ownership. The concept of Data Domains was not clearly defined until Data Mesh articulated it.
Similarly, in a distributed Self-Service ecosystem (ideally), Data Governance required a novel approach to describe metadata, or data about data. The challenge was that such metadata, when applied to Federated Governance and Self-Service, necessitated processes for managing metadata sharing, control over changes and change management, access management, among other considerations. To address this issue, the concept of Data Products was introduced.
Early implementations of Data Mesh often resemble this framework.
Sounds simple enough, right? It also seems that we already have almost everything we need to build such a platform within organizations using the existing technology stack and advanced analytics capabilities.
领英推荐
Given the perceived technical simplicity of its implementation, the promise of Data Mesh was highly appealing:
Reading between the lines, a few other promises become apparent:
A significant cleanup, though unavoidable, could lead to improved data quality as a byproduct of transitioning to Mesh.
These are bold promises, especially considering that our industry has struggled with these issues for decades. However, the demand from businesses was clear, and many technological aspects were nearly ready for Data Mesh. Consequently, numerous companies embarked on this journey.
Lesson One: Responsibility of Product Owners
A fundamental conflict within Data Mesh lies in balancing centralization—necessary for governance and addressing redundancies and semantic conflicts—with decentralization, which emphasizes domain/product ownership and self-service.
The first instance where this conflict emerges and requires resolution is in the roles and responsibilities of data producers and Data Product Owners. Considering the entire setup, around 90% of data engineering responsibilities fall to the Product Owner or Data Producer (assuming they represent different entities). On the other hand, the Data Consumer has minimal responsibilities, almost negligible in comparison. However, the problem arises from the Data Consumer's critical dependence on the Data Product, including its service level agreements (SLAs), quality, and other aspects. Therefore, the Data Consumer needs firm commitments from the Data Product Owner, who is tasked with not only building but also maintaining, evolving, and resolving issues with the Data Product.
Potential owners of Data Products, recognizing these challenges, are often reluctant to commit to maintaining them. Moreover, within the current organizational structures, they might negotiate concessions from Data Consumers or the broader enterprise to ensure the provision of information with the required quality and SLAs.
There are few workarounds for this problem.
Top-Down Enforcement
Data Producers can be compelled to create and maintain Data Products through top-down enforcement. This directive might originate from upper management, especially if the Data Mesh concept has been endorsed by the board or executives, or it could be implemented as an organization-wide policy. However, this approach often proves ineffective due to the high complexity of the problem and the myriad opportunities for internal sabotage. Furthermore, it demands considerable effort in terms of management and control and tends to foster an unhealthy environment for conflict resolution and other critical tasks, such as dealing with redundancies and establishing governance.
One notable downside of this method is the inevitable political cost for those enforcing it. Implementing such measures on a large scale invariably leads to significant consequences, consuming the "political capital" of those in charge. Inevitably, these individuals either exhaust their influence, as few problems get effectively resolved in this manner, or they are compelled to lessen the enforcement pressure, resulting in a transformation process that fails to reach a logical and satisfactory conclusion.
Culture Shift
At the core of enterprises and other businesses are people, with most activities executed by and for them. In certain organizations, the prevailing culture fosters the sharing of data based on a general consensus that it should be openly available within the company. In such environments, treating data contracts similarly to open-source software can establish a moral obligation in the owners of Data Products to maintain data quality and adhere to SLAs.
This approach is often effective when the owners of Data Products are also users of these products, or when the organization critically relies on cross-LOB (Line of Business) exchange, leaving no room to withhold or obscure data from other exchange participants.
Open and equal data sharing within a company also necessitates addressing conflicts in the semantics of data entities among different competing parties. It's important to note that resolving these conflicts relies on the goodwill of Data Product owners.
In some instances, it is feasible to gradually shift the existing culture to enable such exchanges within the organization (or at least within certain segments of it). This process requires consistent effort over time and is typically challenging to implement. A combination of champions for enablement, communities, automation, and example-led scenarios can yield positive results.
This culture of open data exchange is more commonly found in smaller organizations where LOBs are interdependently crucial or in specific segments of larger enterprises. In such cases, some LOBs might freely exchange information using centralized or unified platforms, while others may operate in isolation.
Reselling Technical Debt
Many organizations exhibit a reluctance to openly exchange data and assume extensive responsibilities for their Data Products. In such environments, Data Consumers or occasionally a centralized authority might opt to allocate a portion of their budget to build and maintain certain Data Products. Consequently, this can incentivize Data Product owners to develop and share their data as Data Products, seeking additional funding to address existing technical debt or resource shortages for system or product maintenance.
However, this approach has several major downsides:
Despite these challenges, reselling technical debt is an approach several companies are exploring as part of their transition to Data Mesh.
Data Economics / Incentivizing Data Producers
The most direct and seemingly straightforward approach to motivating Data Product Owners and Data Producers is to incentivize them for the usage of Data Products. Essentially, this involves creating a system of Data Economics within an enterprise. Considering that many organizations already purchase data from external providers, why not implement a similar model internally?
However, while the concept might sound simple, it actually necessitates significant organizational transformation and a reevaluation of how different Lines of Business (LOBs) collaborate. A major challenge is that each organization must navigate this transformation uniquely, as there is limited theoretical guidance and practical precedent.
Despite the complexity, some enterprises are attempting to define and pilot such a framework on a limited scale. This approach appears promising, as it not only addresses questions of motivation but also fosters innovation and encourages optimization among Data Product Owners.
Lesson One: Conclusion
Addressing the motivation of Data Product Owners to create, clean, and maintain Data Products with specified contractual and operational parameters is a paramount priority in Data Mesh implementation projects. Currently, there are four primary strategies to achieve this, with Cultural Change and Reselling Technical Debt being the most prevalent.
To effectively implement Data Mesh at any scale and in organizations of varying complexity, a novel approach to Data Economics is required. This approach should enable Data Product Owners to receive appropriate incentives for the Data Products they provide within the organization.
Thank you for reading until the end. Hope you have enjoyed it and please feel free to like or reach out directly, so I know there is a need to continue this series. Thank you!
Independent Consultant | Data Enthusiast | CDO Summer School 2023
11 个月I love it when I can mentally shout out "Yes, Yes, Yes" as I'm reading an article! Really fun to read and well explained, thank you for taking the time to write this. I look forward to your next in the series!
Head of Data @ Axellect KZ | Driving Data Strategy and Governance
11 个月Are there any examples of data economy implementation? Very interested
Cloud Solution Architect (CSA) - CSU Data and AI Team | Microsoft
11 个月"..while the concept might sound simple, it actually necessitates significant organizational transformation and a reevaluation of how different Lines of Business (LOBs) collaborate" #truthbomb - great share Andrei Zaichikov
Enterprise Data Strategy Lead | Chief Data Office
11 个月Agree, and company size can dictate the need for a combination of approaches, not just one or another. All approaches may be necessary for success. Looks like we share the same experience and understanding of the problem. Eagerly awaiting the next part of the blog to check my beliefs.