When it comes to "The Data Mesh", two things are for sure:
- The term is HOT: it surprised many in 2021 and many more expect to debate it in 2022...at least if you believe the results of the results of the two surveys I published on LinkedIn and Tony Baer's reference in his latest ZDNet post: "data mesh was one of the topics that?broke the internet in 2021".
- There is a lot of contention around the term — What is it?! Is it REALLY new?! Why does it matter?!…all of which are great questions. BUT, watch out for “VendorSpeak”, e.g vendor blogs, papers, webinars that attempt to define it but that could, in the process, risk distorting its original definition and intent.
So, here are some resources that you might find helpful as you assess the ‘existential’ nature of the "Data Mesh" for your company. Let me know in comment if I've forgotten anything / anyone that can provide an unbiased set of principles to help the community mature this important concept.
Tony Baer’s 2-Part Series on the "Data Outlook 2022"
- Data 2022 outlook, part one: Will data clouds get easier? Will streaming get off its own island?? Lots covered in this piece. I’ll keep a tab on Tony’s wishlist for 2022: “Our wish list for 2022 includes embedding some data fabric, cataloging, and federated query capabilities into analytic tools for end users and data scientists, so they don’t have to integrate a toolchain to get a coherent view of data. There is excellent opportunity to embed ML capabilities that learn and optimize into an end user’s or organization’s querying patterns — based on SLA and cost requirements”.
- Data 2022 outlook, part two: Reality bytes the data mesh. Lots to take away from this piece. 3 in particular that I think are important concepts: 1) "Data meshes address a number of valid concerns about the limitations of top-down management or ownership of data", 2) "bottom-up approach to data management and governance that should theoretically improve accountability" and 3) "downside is that, not properly managed, data meshes could amplify or proliferate data silos, leading to waste, duplication, and inconsistent management and governance" (the above direct quotes so all credit goes to Tony Baer here)
Francois Nguyen's 2-Part Series on "Toward a Data Mesh"
Francois is Data Analytics CIO at L'Oréal and has his blog "YaBoD" (or "Yet Another Blog on Data") is one to bookmark.
Towards a Data Mesh (part 1) : Data Domains and Teams Topologies. In this piece, Francois does a great job pointing to groups and resources to help the community. Some of his observations are pure gems, for instance:
- "The right problem is to solve is this one ; to have different teams working on different subjects (data domains) and still be able to share and cooperate together"
- Juan Sequada's definition of Data Mesh: "It is paradigm shift towards a distributed architecture that attempts to find an ideal balance between centralization and decentralization of metadata and data management.”.
I chuckled when he referred to the concept of "commitment" through the business fable of "The Chicken and the Pig" (you'll have to read his blog! :)
Finally, great words of wisdom concludes his post as he connects Zhamak's work to Matthew Skelton and Manuel Pais' book on "Team Topologies".
"The heart of the Data Mesh is about the architecture but everything starts with Data Teams and how they are organized. It is very far from the one central team that will save the world"
Toward a Data Mesh (part 2) : Architecture & Technologies. If you don't have time to read the full article, Francois' got your back: he starts his piece a TL;DR. However, you should really take the time to get his perspective on:
- Serverless and why "life is too short to do provisioning" (see graphic below for more).
- The 6 rules of data domains: 1) Discoverable 2) Addressable 3) Trustworthy 4) Self-Describing 5) Inter-operable and 6) Secure.
- Focus on Observability, Data Ops and Automation and a great reference to Prukalpa's piece on Catalog 3.0
Disclosure: Thinh Ha is employed by Google Cloud Professional Services, however the opinions expressed in his blog are his alone. Also, I have not talked to him about his blog, before, during or after publication. I thought that his piece asked some interesting questions that could help advance the topic, namely:
- You are not operating at a scale where decentralization makes sense.
- You do not have a strong business-case for how adopting Data Mesh will deliver business value for individual business units
- You treat Data Mesh as a technical solution with a fixed target rather than an operating model that continuously evolves over time
- Your organizational culture does not empower bottom-up decision-making
- You do not have clearly established roles & responsibilities and incentive structure for distributed data teams
- You do not have a critical mass of data talent
- Your data teams have low engineering maturity
- You expect to find off-the-shelf software to help you adopt Data Mesh
- You do not have buy-in to “shift-left” security, privacy, and compliance
- You do not consider Data Governance to be a core activity to be prioritized against other activities in every data team’s backlog
If the above doesn't work for you, go back to the source and read Zhamak Dehghani's original paper at MartinFowler.com: Data Mesh Principles and Logical Architecture.
Please do let me know in comment if I've forgotten anything or anyone that can provide an unbiased set of principles to help the community mature this important concept.
In the meantime, here are additional community resources and folks to follow:
Partner Alliance Marketing Operations at Data Dynamics
8 个月Great insights! Understanding the nuances of decentralization, organizational readiness, and cultural alignment is key.
Chief Marketing Officer | Product MVP Expert | Cyber Security Enthusiast | @ GITEX DUBAI in October
2 年Bruno, thanks for sharing!
Value-driven Data & AI Strategy | Data & AI Products Management | AI Governance
2 年Thanks for compiling this Bruno Aziza - indeed a great list of resources for those starting "Data Meshing" or struggling with the right articulation between organization / technology / business objectives. As always, the main pitfall is to look at it as a purely tech / architectural construct, but the real challenges are wayyyy larger than that! But, that's remains a great way to ask the right questions around ownership, governance, visibility & accessibility, costs management, etc. ??
Principal Scientist & Head of AI Lab at data.world; co-host of Catalog & Cocktails, the honest, no-bs, non-salesy data podcast. Scientist. Interests: Knowledge Graphs, AI, LLMs, Data Integration & Data Catalogs
2 年Great list of resources!
Chief Data Officer @ Richemont & Cartier
2 年Sumeet Goenka