Essential Ingredients for a Data Mesh Architecture (Part 2 – Enterprise Aspects)

Essential Ingredients for a Data Mesh Architecture (Part 2 – Enterprise Aspects)

This is the second Article around Data Mesh Architecture. For an introduction to the concepts of a Data Mesh Architecture please visit this link

A key Data Mesh principle is the concept of Domains which have a bounded context under which they work thereby creating a de-centralized data architecture. At the same time , it is essential - 

  1. To have some Data Management aspects to be Enterprise driven (to avoid a wild west situation) 
  2. To have some Data Management aspects to be reusable (to avoid reinventing the wheel) 

This article delves specifically into the areas which I believe require to be at an Enterprise level. In a subsequent article I will try to drill deeper into the re-usability aspects and painting the overall picture in a reference architecture around Data Management for a Data Mesh.

Data Governance  

Imagine if the internet allowed each browser to communicate using their own choice of protocols. For users accessing websites and trying to switch from one browser to another would require completely different addresses. Google Search would need to again come to the rescue but navigating to Google would also require different protocols in different browsers. Ouch... 

A Data Mesh although Domain Driven requires an Enterprise understanding of Data and relating Data to Business meaning. Depending on certain business process, there could always be slightly different representations depending on the Domain Context, but the process and business terms should be consistent across the Enterprise and across Domains. 

Avoiding a maze where Data users are continuously trying to derive business meaning out of Datasets exposed by a Domain is essential in a Data Mesh Architecture

The same can also be applied for Global Compliance Policies such as GDPR or Privacy Policies and how Data should be integrated, stored, measured, and accessed for use. As these are essential for the transparency and the integrity of an organization, they need to be centrally governed.  

Equally essential is to have a Business Friendly, Governed, Self-Service Marketplace that allows users to easily ‘Shop’ for data and have their Data Provisioned in a governed yet automated fashion. This encourages collaboration and a product thinking mindset in the Data Assets for consumption.  

Data Registry/Discovery (a.k.a an Enterprise Catalog with Data Lineage) 

The concept of a Data Catalog is not a new one, multiple solutions exist already and are leveraged by customers in their Data Architecture. Data Mesh and a de-centralized architecture approach however requires a more global and powerful solution. 

  • Enterprise over Domain– Since each Domain will have different needs and different consumers, underlying technology and implementation strategies might be different. Therefore, the points in a mesh (viz. Data Assets) require a Catalog solution covering both breadth and depth in terms of the former being able to handle different data latencies, such as data exposure types (Views, APIs, Streams etc.) and cross technology and deployment locations (On-prem/Multi-Cloud).  
(Source – Getty Images)

Who knows what exists in between?


(Image Source - Getty Images)

  • End to End Lineage over Documentation – The other key capability is to enable end to end lineage. End to end here is a broad term, but the solution should cover both Vertical and Horizontal Lineage across Domains in a Data Mesh. Vertical Lineage refers the capability to tie up Physical Data Assets to Logical and Business-related Entities defined. Horizontal lineage refers the ability to go left to right from Source till multiple hops in the Data Mesh to trace the route from where consumption begins. This again needs to be technically diverse spanning Operational Applications, Data Integration components, Data Storage Ecosystems and Reporting Tools. Ideally the lineage should also be drill-able to be able to understand impact at individual attribute level. Finally, the lineage should be self-updating as the components in the path change, much like a modern digital map.  
Tying back to the Data Mesh, lineage is a critical capability to avoid a Domino effect where an issue with a Dataset exposed and utilized in subsequent Consumption chains creates a ripple non-traceable effect across the organization. 

Master and Reference Data Management 

Master Data and Reference Data are often looked at as Foundational Pillars to enable Enterprises. Master Data Management is seldom done at an application level or at Data Warehousing and Analytics level which essentially brings challenges in itself. Extending this towards a Domain Level Master Data Management creates a partial view of a master data domain. One of the principles of a Data Mesh Architecture is to enable data collaboration, but a global and a consistent view of Master Data is essential to have a single consistent view of an Enterprise. Similarly, a consistent view of Reference Data both Internal and External to the organization is essential to drive consistency and simplified maintaincce across Domains and Data Assets.  

It is critical to avoid multiple versions of truth when it comes to Master Data and to avoid rework for each Domain to reconfigure Master Data between Domains. 

No alt text provided for this image

On a lighter note, we need to avoid such a confusing surprise





(Image Source - Getty Images)

Data Quality (Hybrid Approach) 

Last but not the least in terms of Global Management comes Data Quality. I believe Data Quality is something which needs to be both managed at an Enterprise Level as well as a Domain level. At an Enterprise level it is essential to define Data Quality rules which needs to be measured close to the Business Terms and Processes. Additionally, it can add value to define Data Quality standards and metrics which can ensure an Enterprise level of Data Maturity. Simply put, it can act as an overall set of standards which can be expected by a user when they try to consume Data Assets irrespective of the Domains that they are part of. 

At the same time, there will be a need to have specific Data Quality rules at a Domain level specific to the business and data requirements within the Domain.

It is important to have a Data Quality solution which is design driven than code driven to avoid underlying technology and implementation being the focus over the actual Data Quality measurements which need to be implemented, captured and interfaced back to the Data Catalog and Governance areas.

To conclude this article, I would also like to emphasize that the theme of Enterprise Data Management aspects should still be a collaborative approach rather than a completely independent set of members who do not understand the day to day aspects of Operational Implementation as well as the specific requirements and decisions made at a Domain level.  

I appreciate your thoughts and comments to collaborate and understand how these ideas have been realized in practice. 


Johan Lundh

Helping organizations on their data driven Digital Transformation founded on AI

4 年

Great work Sidd !

要查看或添加评论,请登录

Siddharth Rajagopal的更多文章

社区洞察

其他会员也浏览了