What is Data Mesh?

What is Data Mesh?

What is a data mesh?

Data mesh is not a technology; it is a conceptual theory of what types of applications we can place on it.

Interview answer: What is data mesh?

Data Mesh is an intentionally designed distributed data architecture, under centralized governance and standardization for interoperability, enabled by shared and harmonized self-serve data infrastructure.

I will try to articulate. Four principles make up a data mesh.

  1. domain-oriented decentralized data ownership and architecture.
  2. data as a product.
  3. self-serve data infrastructure as a platform.
  4. federated computational governance.

No alt text provided for this image
Data Mesh is a concept not a Product

  • Centralized ownership to decentralized ownership
  • Pipeline as first-class concern to domain data as first class concern
  • Data as a by product to data as product
  • A Siloed data engineering team to cross functional domain data teams
  • A centralized data lake/warehouse to an ecosystem of data products.

Based on the applications requirement data mesh domains creation will have different type each type has different governance policy .

Type 1 : is easy to build but it away from the mesh model

All the domains use the same technologies, and all the domains use a centralized data lake split by folders and containers.

Example case : Company uses the Azure as cloude storage and databricks technologies and somany teams are using the centralized data lake.

No alt text provided for this image

Type 2 : Domains uses same technology but the storage is different in each domain.

Example: using Snowflake as common across all the domains, and storage is different from each domain (AWS, GCP, etc..)

No alt text provided for this image

Type 3: Each domain uses any technology and a different storage solution. (nightmare to build)

Example : Oracle + S3 storage, and Databricks with Azure.

No alt text provided for this image

conclusion:

No alt text provided for this image



Data lakes result in disconnected data producers, impatient data consumers, and, most problematic of all, a backlogged data team unable to meet business demands. Domain-oriented data architectures, such as data meshes, provide teams with the best of both worlds: a centrally managed database (or a distributed data lake) with domains (or business areas) in charge of their own pipelines. Data architectures can be scaled most easily by breaking them down into smaller, domain-oriented components.

Ref : https://www.youtube.com/watch?v=1Kk7NE1RMVQ

Ref : https://www.youtube.com/watch?v=uVaCQ9ESJOI

Ref: https://www.youtube.com/watch?v=l_3RyxsoZks

If any issues are edites required in the document, Please comment.

Thank you.

Vamsi Krishna

Consultant (Data Science | Cloud | Big Data | DBA | Netezza ) At CDW Technologies India Private Limited

2 年

very informative.

回复
Danish Ansari

Microsoft Certified - Senior Azure Data Engineer at Publicis Sapient

2 年

CFBR

回复

要查看或添加评论,请登录

Saikrishna Cheruvu的更多文章

社区洞察

其他会员也浏览了