Rethinking the Data Mesh: Apply it Piecemeal
ABSTRACT: The data mesh is gaining popularity as a distributed architecture that mirrors reality. But its practical limitations mean it should be applied piecemeal. (Eckerson Group.)
Our clients now ask us about the “data mesh”- an emerging distributed architecture that addresses the shortcomings of traditional centralized data architectures which power most enterprises today.?
For those of us who have been around this field for a while (almost 30 years for me), the term “data mesh” conjures up a?déjà vu of past distributed technologies, whose monikers long ago entered the dustbin of analytics history: virtual data warehouse, enterprise information intelligence, logical data warehouse, data virtualization, and others. Whatever the name, these technologies all have provided a global semantic view of distributed data and a mechanism to federate queries across systems in real time.
To be fair, the data mesh as conceived by?Zhamak Dehghani?is broader than a traditional data virtualization tool. It embeds the technology within a decentralized ownership model, distributed data development, and federated governance. It also requires a common data platform that abstracts the complexity of building and managing data products and semantic models so ordinary business users can publish data to the mesh.?
For companies exploring the architectural possibilities of a data mesh, it’s noteworthy that no one seems to understand it. There is confusion among people who have read deeply on the subject[1]. And it doesn’t help that Zhamak’s writing style is a bit opaque, which prompts skeptics to wonder if the concept is simply “old wine in new wineskins” and gives vendors license to proclaim that their technology powers it. If nothing else, the data mesh is now a huge marketing bonanza for vendors who want to hitch themselves to a hot industry buzzword.
If nothing else, the data mesh is now a huge marketing bonanza for vendors who want to hitch themselves to a hot industry buzzword.
There are two issues here: 1) how seriously should we take the data mesh? Is there any redeeming value in this concept? and 2) what is the nature of the data mesh technology that vendors are hawking today? We’ll tackle the first question in this article and the second next month.?
Data Mesh Perspectives
Centralization versus decentralization.?The data mesh is a reaction to centralized data organizations, architectures, and governance models that are prone to rigidity, inflexibility, and backlogs. To address these shortcomings, the data mesh pivots 180-degrees and embraces a decentralized approach. The data mesh gives domain owners (i.e., department heads) complete jurisdiction over data in their domains. With a data mesh, data never moves; anyone who wants the data goes to the department to get it. That puts the onus on departments to model and publish data in a form that others can use.?
Unfortunately, decentralized organizations and architectures foster a hornet’s nest of problems: they proliferate data silos and fragment data, making it impossible for executives to get quick answers to simple questions, like “How many customers do we have?” To me, the data mesh reinforces bad organizational habits and renegade behavior. If you want to modernize your data environment, do you double-down on data dysfunction and reinforce data silos? This is akin to paving the cow paths. But that is what the data mesh approach advocates.
To me, the data mesh reinforces bad organizational habits and renegade behavior. If you want to modernize your data environment, do you double-down on data dysfunction?
Importance of Federation.?The only route to enduring value is to balance the imperatives of centralization and decentralization. Central teams deliver standards, scale, and technical expertise, while distributed teams deliver agility, adaptability, and domain knowledge. One without the other creates huge problems. The best way to design data organizations, architectures, and governance approaches is to federate them: align central and local resources and activity so you get the best of both worlds with few of the drawbacks.
领英推荐
The best way to design data organizations, architectures, and governance approaches is to federate them.
Bottom-up federation.?Fortunately, Zhamak recognizes this, at least partially, and builds federation into the data mesh concept. For example, each domain needs a product owner who must understand the needs of enterprise users; the self-service platform is built centrally (presumably) to help domain owners manage and publish their own data; a central team (presumably) needs to build cross-functional data products that no single domain can or will build; and perhaps most importantly, cross-functional teams of product owners and developers must convene to hammer out global models, governance standards, and application interfaces.?
Top-down federation.?This sounds a lot like what organizations with centralized data architectures do. The best ones align central and local resources by assigning technical specialists to data domains, establishing cross-functional governance boards, and scheduling regular strategic planning meetings with department leaders. In addition, central data teams build platforms and tools that empower business units to meet their own data and analytics needs and deliver cross-functional applications that no department has the time, money, or interest in building.?
In the end, it doesn’t matter whether you federate from below or from the top as long as you do! While many centralized data teams have learned to federate best practices, standards, and development, the reality is that some departments (e.g., finance, sales, marketing) have their own data engineers and data analysts and don’t want to give them up. But with the right incentives, these departments can be coaxed to outsource costly IT-related tasks, like maintaining brittle data systems managed by part-time data engineers that provide a colloquial view of data, to a central team without forfeiting local control.?
It doesn’t matter whether you federate from below or from the top as long as you do!
Data Mesh Drawbacks.?Smarter people than me have evaluated the data mesh in-depth and see shortcomings. Many of the drawbacks focus on the lack of expertise and interest within departments to publish and manage data on behalf of themselves and others. Here is a sampling of 30+ areas of concern mentioned by James Serra and followers of his?blog. Serra is a well-known data architect who was a data evangelist for Microsoft and now is a data platform architecture lead at EY.?
Some Challenges Facing Data Mesh Implementations
Our Recommendation: Apply the Data Mesh Piecemeal
The data mesh is an interesting concept, and I’m glad it’s being vigorously debated. Since many companies have an abundance of data silos, I can see how the approach has widespread appeal. But there are drawbacks.?
First, most domains don’t have the time or resources to manage data for themselves, not to mention others. Second, the technology to implement the data mesh—namely the self-service data platform—doesn’t exist yet although there are promising developments, such as no-code data pipelining tools and data exchanges that make it easy to share data. Finally, the coordination costs of implementing global semantics, governance, and interfaces are daunting.?
In the final analysis, the data mesh works if you implement it piecemeal.
In the final analysis, the data mesh works if you implement it piecemeal, not as an enterprise architecture. The data mesh works for departments with their own data engineers who can manage and publish their own data, especially if the central team develops a mesh-like self-service data platform that makes it easy for domains to model and manage data sets. But most domains will continue to rely on corporate IT to service their data requests and build complex analytic applications. It is much more realistic to support these groups using a traditional centralized ownership model and architecture.?
[1]:?“Data Mesh: Centralized Ownership vs Decentralized Ownership”?by James Serra, July 23, 2021.?
Author, Advisor, Mathematician .Thinkers360 Global Thought Leader/Influencer iAI, Analytics, Predictive Analytics, National Security, GenAI, International Relations, Design Thinking, InsurTech, Quantum, and Health Tech
2 年“The data mesh gives domain owners (i.e., department heads) complete jurisdiction over data in their domains” i think this is the central fallacy of the data mesh. Every domain must have access to shared data across domains. how do you account for that?
Architecture Consultant at Teradata Corporation
2 年Thanks for this article, Wayne. You've articulated concisely some of the vague impressions that have been sloshing around in the back of my head for some time regarding #datamesh, especially pertaining to the organizational challenges involved.
Entrepreneur / Serial Disruptor / Champion of an ever-evolving #TruerSelf, #HuSynergy and an emergent #HumanSingularity / Accelerating #HumanEvolution, Self-Coherence, #YOUniqueness, #TruerPurpose / #HuEcoSystem(s)
2 年"....distributed architectural that mirrors reality." #Hummm Not quite... ... that mirrors brain/mind. Why? To generate finer, and finer-grain #HuSynergy (within the individual and without the individual) And within and between #HuEcosystem(s) ... The human mind - as a collective - has been directing technology since the beginning of time. We have simply been unaware of this unconscious motiv'. (Motivation / to move) IMHO i believe it's time to get on-board the innate motivations of the mind and human connectivity (Hu2Hu and M2M technology) as it relates to.... everything... ... including making money in the future. #holographic (".... hypothesis of perception and memory") #holoflux #holomovement #holonomy "Pribram & Bohm" (google them) Alejandro Corpe?o Jorge Garcia Isaac Lien Christopher Patten Mark Downham Bill Schmarzo John Morley Fran Willis White Ed Morrison Mani Vannan Elsa Hardy
Author, Advisor, Mathematician .Thinkers360 Global Thought Leader/Influencer iAI, Analytics, Predictive Analytics, National Security, GenAI, International Relations, Design Thinking, InsurTech, Quantum, and Health Tech
2 年First of all, I think it's an impossible problem to solve. Everything starts with so-called "source" data. What do we really know about it? What was the developer thinking to name things? What data was not included as unimportant (though we know it becomes important when we try to integrate it). What is the provenance of source data? How did it exist in its primordial form? At a certain level, data is an oxymoron. The?context?of data - why, how, and when it was recorded, and by what method it was collected and then transformed is always relevant. We can't triangulate data to see if it's consistent with other instances of the same phenomenon or event. Data isn't something that's abstract, out there, and value-neutral. Data only exists when it's collected, and collecting data is a human activity. And in turn, the act of collecting and analyzing data changes (one could even say ‘interprets') us. As?Robert Searle said in?Why Data Is Never Raw: "There is, then, no such thing as context-free data, and thus data cannot manifest the kind of perfect objectivity that is sometimes imagined." I was astounded when Bruno dismissed semantics. What else is there? If we get 99% right, selling shoes to fashionable ladies would be OK, but flotilla drones with nuclear-tipped tactical missiles could start a global thermonuclear war. So bottom line, the mesh is just another way to organize things we don't understand.
Thank you Wayne for a thoughtful article and - also for the link to the James Serra piece. I think that the central question is this: are we federating architecture and infrastructure, or governance - or both? The Laws of Physics mean that we should think carefully before we distribute analytic data too widely - taking data to processing, rather than the other way around, is a well-understood anti-pattern that more-or-less guarantees performance and scalability challenges - even as we acknowledge the numerous technical, regulatory and commercial pressures that make centralising all of the data all of the time problematical. Federating development and governance can work, to your point - but do demand at least some level of central co-ordination and governance; if domains A, B, C and D don't all align, as an absolute minimum, on a minimum set of join keys and standards for key measures then not only are we back to costly, redundant data silos, we can also wave goodbye to optimising end-to-end business processes. That's why at #teradata we often talk about "connected and co-located" data products. "Co-located" means a common platform wherever that's practical (which isn't always), so that we're not shipping data all of the time. And connected refers to ensuring that we have (at least) "just enough governance to perform" (with apologies to the Stereophonics!).