Design a Data Mesh Architecture in Practice

Design a Data Mesh Architecture in Practice

Data Mesh vs Centralized Data Model

Long lasting relational databases and transactional architectures still have been well served variety of use cases. Once, however, organizations understood there are lots of “values” in data itself, analytical use cases brought different necessities and consequently different architectures.

Surging from Batch processing and Lambda architectures to Kappa and Micro services architectures basically came to addressed to accomplish bigdata challenges for the business.

Single Source of truth also surged to make sure everyone will see one source of unified data with centralizing data in a Data Lake. Therefore, one team produces the data and the whole other can consume those data.

In reality, however, in fully centralized data lakes, there are some clear gaps between business areas and IT team. IT teams & single Data Eng. teams try to build and create data pipelines in a hope that LOBs and executives can get full benefits of data. Since the gap is so big, in reality, in most of the cases this does not happen. And that’s because who produce the data and make it ready is not who really use it.

Data Mesh architecture concept, however, comes to reduce this gap. Organizations Looking at the data as a Product and not merely as an asset. It is where we believe we will be closer to a democratized data driven business.

One of the most important enablement you can name using Data Mesh is “Data Autonomy”. Where building a self-service data infrastructure can help very much data democratizing in practice.


No alt text provided for this image


Data Mesh Architecture in Practice

I have seen plenty of scenarios where in theory building an architecture with Domain based / Data product approach is easy. However, for some, making those theatrical concept into reality has been arguably a challenge.

Cloud has reduced the complexity to build flexible data architectures. Those flexibilities, governance and security options have been critical to build new approaches which will end up transform ideas to a reality.


No alt text provided for this image


Some points to consider:

  • Data mesh is a pattern for defining how organizations can organize around data domains with a focus on delivering data as a product. However, it may not be the right pattern for every customer.
  • The Lake House approach with a foundational data lake serves as a repeatable blueprint for implementing data domains and products in a scalable way.
  • The way we look at the data here is different and the way we work in each LOB also is different (LOBs should become owner of the product, from build to produce)

The following are user experience considerations:

  • Data teams own their information lifecycle, from the application that creates the original data, through to the analytics systems that extract and create business reports and predictions. Through this lifecycle, they own the data model, and determine which datasets are suitable for publication to consumers.
  • Data domain consumers or individual users should be given access to data through a supported interface, like a data API, that can ensure consistent performance, tracking, and access controls.
  • All data assets are easily discoverable from a single central data catalog. The data catalog contains the datasets registered by data domain producers, including supporting metadata such as lineage, data quality metrics, ownership information, and business context.
  • All actions taken with data, usage patterns, data transformation, and data classifications should be accessible through a single, central place. Data owners, administrators, and auditors should be able to inspect a company’s data compliance posture in a single place.

Data Consumer1 and Data Builder1 (Producers) are from a single domain/dept. The idea here is to remove the gap between those two entities. And in reality, means how we can give more autonomy and flexibility so that domain areas can make the most of their data by creating their own “product”.


Many thanks!

Rosane Ricciardi

CDAO at Amil Group | 2024 Global Top100 Innovators in Data & Analytics by #Corinium | 2022 Global Top 100 Leading Enterprise Data Leaders by #CDOMagazine

2 年

excelente Arvin, eu ainda vejo o desafio de manter o catalogo atualizado x a velocidade q fazemos de ingest?o no data lake. tambem vejo ainda as empresas c muita demanda para camada semantica centralizada para ter a versao unica do numero…. mas vejo crescer a demanda por self service …. e de novo camada semantica e catalogo sao os atores principais para isto dar certo. bj Ro

Ed Carter

Data Management, Faithlife, LLC Founder/CEO, CartersFarm.Software - a small software company with small ideas. Aspiring Cartoon Mime Voice Actor.

2 年

I think it is important to note that the reference to "business context" needs itself to be managed within an overarching ontology so semantic meaning can be established consistently for all of the self-service actors. In practical terms, lack of collaborative attention in this area has been the weak spot in many of the data architecture projects I've seen. Serious work using Knowledge Graphs or other semantic disciplines is essential to actually implementing data lakes, lake houses, or other approaches to data as a product. #ontology #knowledgegraph #semantic

Tiago Gorjon

Coordenador de Sistemas @ Vivo | DEVOPS TEAM

2 年
Alberto Cardoso

Solutions Architect at Databricks

2 年

A gente pode criar uma banda de pagodata chamada data mesh e remesh ??

要查看或添加评论,请登录

社区洞察

其他会员也浏览了