登录查看更多内容

Rise of Data Mesh Architecture [7 out of 10]

Mahmoud Yassin

Senior Data Manager | Customer Data, Insights and AI | Helping Booking.com to modernize their data architecture on top of public cloud infrastructure

发布日期: 2023年3月4日

This is a?series?of articles to talk about the importance of having a solid data architecture in your business. The series will include the below articles:

1- Introduction to Data Architecture 

2- OLAP vs OLTP 

3- Data Warehouse Architecture deep dive 

4- Data Lake Architecture deep dive 

5- Data Lake vs Data Warehouse Architecture 

6- Cloud computing effect on Data Architecture 

7- Rise of Data Mesh Architecture [current article]

8- Data Mesh Vs the rest

9- Rise of DeltaLake Architecture

10- Which Data Architecture shall I choose?

What is Data Mesh

To ensure adherence to data quality and governance with the growing number of data sources and need for agility, the concept of a decentralized data architecture, known as Data Mesh was introduced by Zhamak Dehghani. Data Mesh achieves this by delegating data responsibility to the domain level and providing high-quality transformed data as a product.

Essentially, a Data Mesh is an approach to decentralize data architecture by organising data according to specific business domains such as marketing, sales, customer service, etc. This provides greater ownership to the producers of the datasets, who have a better understanding of the domain data and can set data governance policies focused on documentation, quality, and access. This enables self-service use across an organisation. Although this approach eliminates operational bottlenecks associated with centralised, monolithic systems, it does not exclude the use of traditional storage systems like data lakes or data warehouses. It simply means that these systems are used as multiple decentralised data repositories instead of a single, centralised data platform.

Data Mesh characteristics:

OLTP and OLAP hand in hand:

A groundbreaking shift in data thinking has occurred where OLTP and OLAP can now exist harmoniously, marking the first time this has ever been possible. It's now feasible for the same domain responsible for OLTP to take charge of OLAP as well. This approach has a lot of merit since the data producers within the same domain have an unmatched level of expertise and knowledge about their data. They can clarify all necessary definitions related to the produced data, ensure that data consumers are using their data correctly, and be responsible for the interface exposed to the consumer.

Aims for democratising data:

Data mesh allows for quicker exchange of data from domain to another domain embracing domain ownership.?Data Mesh emphasises the use of data platforms that are designed to enable self-service data access, data discovery, and data sharing, which further supports data democratization. These platforms provide tools and frameworks that allow data producers and consumers to collaborate more effectively, and make it easier for them to share data across organisational boundaries.

Born decentralised:

Compared to other data architectures, Data Mesh stands out for its strong emphasis on decentralisation, in which data is distributed rather than centralised. In a decentralised approach, Data Mesh assigns ownership of data to groups that are specific to each domain. These groups are responsible for serving, owning, and managing data as a product.

These groups, often referred to as "data domains," are empowered to make decisions about how their data should be collected, processed, and shared, based on their specific needs and requirements. This ensures that data is more aligned with the needs of the business, and that it is more readily available to those who need it.

Four data mesh principles

Each principal address the previous challenges from the past data architectures like the data?warehouse or data lake architectures.

1) Data Domain Ownership

The principle of domain-oriented data ownership requires that the domain teams take full ownership and responsibility for the data generated and consumed by their business capabilities. This means that data should be organised and managed around domains, which are aligned with the team boundaries and the system's bounded context. By adopting a domain-driven distributed architecture, the responsibility for managing both analytical and operational data is shifted from a central data team (IT) to the domain teams themselves. This approach enables greater autonomy, agility, and scalability for data management within each domain, while also fostering collaboration and standardisation across domains.

Empower data ownership to the people who are closest to data:

The first and most important principle of data meshes is the idea of?domain-oriented decentralised data ownership and architecture. It aims at resolving a specific problem: moving the analysis of data to the same domain where the data itself was born.

You own it, you are responsible for it:

Data domain experts of the domain from which the data has come are the same responsible for its quality and interpretation. And due to their experience in the domain, the problems regarding interpretation and quality of the data automatically decrease.

Inspired by domain-driven distributed software architecture (Microservices):

?Analytical data should be composed around domains, like the team boundaries aligning with the system’s bounded context. Following the domain-driven distributed architecture, analytical and operational data ownership is moved to the domain teams, away from the central data team.

2) Data as a product

The principle of "Data as a Product" in Data Mesh involves applying a product-oriented mindset to analytical data. This principle acknowledges that there are consumers of data beyond the domain and emphasizes that the domain team is responsible for meeting the needs of other domains by providing high-quality data. In essence, the domain data should be treated like any other public API, with a focus on making it discoverable, reusable, and consumable by other domains. This approach promotes a culture of data sharing and collaboration across the organisation, resulting in more effective use of data for business purposes.

Responsibility:

Each dataset MUST have clear ownership and stewardship. Data owners or Stewards are the key persons who understands the data and responsible for it

First class product:

Product thinking philosophy onto analytical data where the domain team is responsible for satisfying the needs of other domains by providing high-quality data?

Avoiding “dark” data:

The information assets organisations collect, process and store during regular business activities, but generally fail to use for other purposes

I own it, I fix it:

Data quality is fixed in the domain itself where the data is generated. Data quality at source is a key success factor.

I have a usable, valuable and feasible product:

Plain Concepts 8 个月前

90-Day Journal Of An Enterprise Architect In Big Data…

Vintage 1 个月前

Data Architecture Patterns: Choosing the Right Approach

Sanjay Kumar MBA,MS,PhD 2 个月前

“I want to have the best product ever that people are interested in and love the most”

This is a mindset shift where data is in the heart of the domain. Data must be discoverable and understandable

3) Self service data platform

The concept of a self-serve data infrastructure platform in Data Mesh involves applying platform-oriented thinking to data infrastructure. This approach involves a dedicated data platform team that provides domain-agnostic functionality, tools, and systems to facilitate the development, execution, and maintenance of interoperable data products for all domains. By offering this platform, the data platform team enables domain teams to consume and create data products with ease and without the need for extensive technical expertise. This promotes greater agility and efficiency in data management across the organization, while also fostering a culture of collaboration and knowledge sharing.

One data platform team:

The data platform team enables domain teams to seamlessly consume and create data products.

Enabling the domains:

Provide domain-agnostic functionality, tools, and systems to build, execute, and maintain interoperable data products for all domains.

New level of abstraction of infrastructure:

High-level abstraction of infrastructure that removes complexity and friction of provisioning and managing the lifecycle of data products.

New category of tools:

To make analytical data product development accessible to generalist developers, to the existing profile of developers that domains have, the self-serve platform needs to provide a new category of tools and interfaces in addition to simplifying provisioning.

4) Federated governance

The principle of federated governance in Data Mesh is aimed at promoting interoperability of all data products through standardization, which is overseen by the governance group throughout the data mesh. The primary objective of federated governance is to establish a data ecosystem that adheres to organizational policies and industry regulations.

In a similar vein, the self-serve data infrastructure platform in Data Mesh is designed to apply platform-oriented thinking to data infrastructure. This is achieved through a dedicated data platform team that offers domain-agnostic functionality, tools, and systems to facilitate the development, execution, and maintenance of interoperable data products across all domains. By providing this platform, the data platform team enables domain teams to seamlessly consume and create data products, promoting greater efficiency and collaboration within the organisation.

Standardisation:

Achieving?interoperability of all data products through standardisation

Interoperability:

Data generated from each domain can be exchanged and used by any other domain in the organisation and the vice versa

New mesh governance model:

that embraces?decentralisation and domain self-sovereignty, interoperability through global standardisation, a dynamic topology?and most importantly?automated execution of decisions by the platform

Automation:

Automated execution of decisions and policies by the platform via creating a data ecosystem with adherence to the organizational rules and industry regulations

Data mesh challenges

Specialisation:

Data mesh implementation requires specialists to create domain-specific ETL, data lake implementations with complex data systems, and so on.?

Data Redundancy:

Data Mesh, with its multi-cloud and hybrid infrastructure, can make data governance more challenging to manage. The decentralised nature of Data Mesh means that redundancy can occur when the data of one domain is repurposed to serve the business needs of another domain. This can impact resource usage and data management costs. However, with effective governance tools and processes in place, these challenges can be mitigated.

For example, clear ownership and stewardship of data can help ensure that the right individuals or teams are responsible for managing and monitoring data usage. Similarly, standardization and interoperability across domains can help reduce redundancy and promote more efficient resource utilization.

Overall, while there are challenges associated with managing data governance in a Data Mesh environment, they can be addressed through thoughtful planning and implementation.

Adoption Costs:

The transition to a Data Mesh implementation, which involves decentralizing data management, requires significant changes for organizations that are used to a highly centralized data architecture. To establish a good quality Data Mesh solution, ecosystem governance tools and data infrastructure platform tools are necessary. However, implementing and maintaining these tools comes at a cost of bootstrapping and ongoing maintenance. Nonetheless, the benefits of adopting a Data Mesh approach can outweigh the initial costs, as it promotes a more collaborative and agile data management culture, which can result in more effective use of data for business purposes.

Complexity:

Building a decentralised architecture that is controlled through a centralised governance model can be a complex endeavour. It requires careful planning and implementation to ensure that the governance model effectively manages the decentralised architecture without stifling its benefits. One of the main challenges is ensuring that the governance model is flexible enough to accommodate the diverse needs of different domains while maintaining overall coherence and standardisation. Additionally, effective communication and collaboration between domain teams and the central governance group are essential to ensure that the decentralised architecture is aligned with the organisation's goals and objectives. However, despite its complexities, a well-designed decentralised architecture with effective centralised governance can promote greater agility, collaboration, and innovation within an organisation, which can ultimately lead to better business outcomes.

My forthcoming article will provide a comparison between Data Mesh architecture and the more traditional data warehouse and data lake architectures. The article will explore the following topics:

How do Data Warehouse, Data Lake, and Data Mesh architectures differ from one another?
Is it possible for both Data Mesh and traditional architectures to coexist?
What factors should be considered when deciding which architecture to use in a particular situation?

Join a growing community of 1650+ Data Enthusiasts by subscribing to my ????????????????????:?Data Architecture History

#data?#writing?#architecture?#cloudcomputing?#warehouse?#data?#dataarchitecture?#datawarehouse?#datalake?#datalakehouse?#datamesh?#datafabric?#dataanalytics?#ai?#bigdata?#knowledgesharing?#article

Data Architecture History

2,088 位关注者

Lukas Feuerstein

Senior AI & Data Strategy Manager | Deloitte Consulting

1 年

Definitely super hot in the industry????To dive even deeper on?#DataMesh,?#DataFabric?and?#HybridDataManagementcheck out the full day Digital Leaders Meetup 2.0 in Berlin on July 13th with industry leaders from Zalando, Freenow, Baywa R.E., Uniper, Deutsche Bahn, Symrise, Signa Sports and further. --> Event Flyer: https://bit.ly/DigLeaders2Flyer? ?? Participate in engaging think-tanks and a panel discussion. Gain actionable insights for #DataDriven transformation and connect to your peers in our network dinner after ?? Ping me or Sebastian Kunert if you're interested in joining! #DigitalLeadersMeetup

要查看或添加评论，请登录

查看全部

Rise of Data Mesh Architecture [7 out of 10]

Mahmoud Yassin

Senior Data Manager | Customer Data, Insights and AI | Helping Booking.com to modernize their data architecture on top of public cloud infrastructure

What is Data Mesh

Data Mesh characteristics:

OLTP and OLAP hand in hand:

Aims for democratising data:

Born decentralised:

Four data mesh principles

1) Data Domain Ownership

2) Data as a product

领英推荐

3) Self service data platform

4) Federated governance

Data mesh challenges

Data Architecture History

2,088 位关注者

更多精彩文章

社区洞察

其他会员也浏览了

Establishing Foundation Knowledge: "The Data Warehouse Toolkit" by R. Kimball and M. Ross

The Evolution of Modern Data Architecture: From Data Warehouses to Data Mesh and Beyond

Data Architecture

Kimball vs. Inmon: Unraveling the Synergy of Data Warehouse Approaches

Data Lakehouse: Next Generation Data Management

Overview of Data Architectures

The 6-Step Data Architecture Shift Framework (6-DASF): Building a Case for Evolving Your Data Architecture

DATA LAKE ARCHITECTURE

Navigating The Data Landscape: Exploring Cutting-Edge Data Architecture Models

Choosing the right data architecture (Part-II)

What is Data Mesh

Data Mesh characteristics:

OLTP and OLAP hand in hand:

Aims for democratising data:

Born decentralised:

Four data mesh principles

1) Data Domain Ownership

2) Data as a product

领英推荐

3) Self service data platform

4) Federated governance

Data mesh challenges

Data Architecture History

2,088 位关注者

Cloud computing effect on Data Architecture [6 out of 10]

2023年2月19日

Data Lake vs Data Warehouse Architecture [5 out of 10]

2023年2月5日

Data Lake Architecture [4 out of 10]

2023年1月18日

{Classical} Data Warehouse Architecture [3 out of 10]

2023年1月9日

OnLine Analytical Processing (OLAP) vs OnLine Transaction Processing (OLTP) [2 out of 10]

2022年12月14日

Introduction towards Data Architecture? What, Why, Who? [1 out of 10]

2022年12月13日

Introduction towards Data Architecture? What, Why, Who? [1 out of 10]

2022年12月6日

Hadoop In Action

2016年12月11日

The Hadoop Ecosystem - The Tipping Point of Big Data

2016年11月25日

社区洞察

其他会员也浏览了

Establishing Foundation Knowledge: "The Data Warehouse Toolkit" by R. Kimball and M. Ross

The Evolution of Modern Data Architecture: From Data Warehouses to Data Mesh and Beyond

Data Architecture

Kimball vs. Inmon: Unraveling the Synergy of Data Warehouse Approaches

Data Lakehouse: Next Generation Data Management

Overview of Data Architectures

The 6-Step Data Architecture Shift Framework (6-DASF): Building a Case for Evolving Your Data Architecture

DATA LAKE ARCHITECTURE

Navigating The Data Landscape: Exploring Cutting-Edge Data Architecture Models

Choosing the right data architecture (Part-II)