Rise of Data Mesh Architecture [7 out of 10]
Mahmoud Yassin
Senior Data Manager | Customer Data, Insights and AI | Helping Booking.com to modernize their data architecture on top of public cloud infrastructure
This is a?series?of articles to talk about the importance of having a solid data architecture in your business. The series will include the below articles:
1- Introduction to Data Architecture
2- OLAP vs OLTP
3- Data Warehouse Architecture deep dive
4- Data Lake Architecture deep dive
5- Data Lake vs Data Warehouse Architecture
6- Cloud computing effect on Data Architecture
7- Rise of Data Mesh Architecture [current article]
8- Data Mesh Vs the rest
9- Rise of DeltaLake Architecture
10- Which Data Architecture shall I choose?
What is Data Mesh
To ensure adherence to data quality and governance with the growing number of data sources and need for agility, the concept of a decentralized data architecture, known as Data Mesh was introduced by Zhamak Dehghani. Data Mesh achieves this by delegating data responsibility to the domain level and providing high-quality transformed data as a product.
Essentially, a Data Mesh is an approach to decentralize data architecture by organising data according to specific business domains such as marketing, sales, customer service, etc. This provides greater ownership to the producers of the datasets, who have a better understanding of the domain data and can set data governance policies focused on documentation, quality, and access. This enables self-service use across an organisation. Although this approach eliminates operational bottlenecks associated with centralised, monolithic systems, it does not exclude the use of traditional storage systems like data lakes or data warehouses. It simply means that these systems are used as multiple decentralised data repositories instead of a single, centralised data platform.
Data Mesh characteristics:
OLTP and OLAP hand in hand:
A groundbreaking shift in data thinking has occurred where OLTP and OLAP can now exist harmoniously, marking the first time this has ever been possible. It's now feasible for the same domain responsible for OLTP to take charge of OLAP as well. This approach has a lot of merit since the data producers within the same domain have an unmatched level of expertise and knowledge about their data. They can clarify all necessary definitions related to the produced data, ensure that data consumers are using their data correctly, and be responsible for the interface exposed to the consumer.
Aims for democratising data:
Data mesh allows for quicker exchange of data from domain to another domain embracing domain ownership.?Data Mesh emphasises the use of data platforms that are designed to enable self-service data access, data discovery, and data sharing, which further supports data democratization. These platforms provide tools and frameworks that allow data producers and consumers to collaborate more effectively, and make it easier for them to share data across organisational boundaries.
Born decentralised:
Compared to other data architectures, Data Mesh stands out for its strong emphasis on decentralisation, in which data is distributed rather than centralised. In a decentralised approach, Data Mesh assigns ownership of data to groups that are specific to each domain. These groups are responsible for serving, owning, and managing data as a product.
These groups, often referred to as "data domains," are empowered to make decisions about how their data should be collected, processed, and shared, based on their specific needs and requirements. This ensures that data is more aligned with the needs of the business, and that it is more readily available to those who need it.
Four data mesh principles
Each principal address the previous challenges from the past data architectures like the data?warehouse or data lake architectures.
1) Data Domain Ownership
The principle of domain-oriented data ownership requires that the domain teams take full ownership and responsibility for the data generated and consumed by their business capabilities. This means that data should be organised and managed around domains, which are aligned with the team boundaries and the system's bounded context. By adopting a domain-driven distributed architecture, the responsibility for managing both analytical and operational data is shifted from a central data team (IT) to the domain teams themselves. This approach enables greater autonomy, agility, and scalability for data management within each domain, while also fostering collaboration and standardisation across domains.
The first and most important principle of data meshes is the idea of?domain-oriented decentralised data ownership and architecture. It aims at resolving a specific problem: moving the analysis of data to the same domain where the data itself was born.
Data domain experts of the domain from which the data has come are the same responsible for its quality and interpretation. And due to their experience in the domain, the problems regarding interpretation and quality of the data automatically decrease.
?Analytical data should be composed around domains, like the team boundaries aligning with the system’s bounded context. Following the domain-driven distributed architecture, analytical and operational data ownership is moved to the domain teams, away from the central data team.
2) Data as a product
The principle of "Data as a Product" in Data Mesh involves applying a product-oriented mindset to analytical data. This principle acknowledges that there are consumers of data beyond the domain and emphasizes that the domain team is responsible for meeting the needs of other domains by providing high-quality data. In essence, the domain data should be treated like any other public API, with a focus on making it discoverable, reusable, and consumable by other domains. This approach promotes a culture of data sharing and collaboration across the organisation, resulting in more effective use of data for business purposes.
Each dataset MUST have clear ownership and stewardship. Data owners or Stewards are the key persons who understands the data and responsible for it
Product thinking philosophy onto analytical data where the domain team is responsible for satisfying the needs of other domains by providing high-quality data?
The information assets organisations collect, process and store during regular business activities, but generally fail to use for other purposes
Data quality is fixed in the domain itself where the data is generated. Data quality at source is a key success factor.
领英推荐
“I want to have the best product ever that people are interested in and love the most”
This is a mindset shift where data is in the heart of the domain. Data must be discoverable and understandable
3) Self service data platform
The concept of a self-serve data infrastructure platform in Data Mesh involves applying platform-oriented thinking to data infrastructure. This approach involves a dedicated data platform team that provides domain-agnostic functionality, tools, and systems to facilitate the development, execution, and maintenance of interoperable data products for all domains. By offering this platform, the data platform team enables domain teams to consume and create data products with ease and without the need for extensive technical expertise. This promotes greater agility and efficiency in data management across the organization, while also fostering a culture of collaboration and knowledge sharing.
The data platform team enables domain teams to seamlessly consume and create data products.
Provide domain-agnostic functionality, tools, and systems to build, execute, and maintain interoperable data products for all domains.
High-level abstraction of infrastructure that removes complexity and friction of provisioning and managing the lifecycle of data products.
To make analytical data product development accessible to generalist developers, to the existing profile of developers that domains have, the self-serve platform needs to provide a new category of tools and interfaces in addition to simplifying provisioning.
4) Federated governance
The principle of federated governance in Data Mesh is aimed at promoting interoperability of all data products through standardization, which is overseen by the governance group throughout the data mesh. The primary objective of federated governance is to establish a data ecosystem that adheres to organizational policies and industry regulations.
In a similar vein, the self-serve data infrastructure platform in Data Mesh is designed to apply platform-oriented thinking to data infrastructure. This is achieved through a dedicated data platform team that offers domain-agnostic functionality, tools, and systems to facilitate the development, execution, and maintenance of interoperable data products across all domains. By providing this platform, the data platform team enables domain teams to seamlessly consume and create data products, promoting greater efficiency and collaboration within the organisation.
Achieving?interoperability of all data products through standardisation
Data generated from each domain can be exchanged and used by any other domain in the organisation and the vice versa
that embraces?decentralisation and domain self-sovereignty, interoperability through global standardisation, a dynamic topology?and most importantly?automated execution of decisions by the platform
Automated execution of decisions and policies by the platform via creating a data ecosystem with adherence to the organizational rules and industry regulations
Data mesh challenges
Specialisation:
Data mesh implementation requires specialists to create domain-specific ETL, data lake implementations with complex data systems, and so on.?
Data Redundancy:
Data Mesh, with its multi-cloud and hybrid infrastructure, can make data governance more challenging to manage. The decentralised nature of Data Mesh means that redundancy can occur when the data of one domain is repurposed to serve the business needs of another domain. This can impact resource usage and data management costs. However, with effective governance tools and processes in place, these challenges can be mitigated.
For example, clear ownership and stewardship of data can help ensure that the right individuals or teams are responsible for managing and monitoring data usage. Similarly, standardization and interoperability across domains can help reduce redundancy and promote more efficient resource utilization.
Overall, while there are challenges associated with managing data governance in a Data Mesh environment, they can be addressed through thoughtful planning and implementation.
Adoption Costs:
The transition to a Data Mesh implementation, which involves decentralizing data management, requires significant changes for organizations that are used to a highly centralized data architecture. To establish a good quality Data Mesh solution, ecosystem governance tools and data infrastructure platform tools are necessary. However, implementing and maintaining these tools comes at a cost of bootstrapping and ongoing maintenance. Nonetheless, the benefits of adopting a Data Mesh approach can outweigh the initial costs, as it promotes a more collaborative and agile data management culture, which can result in more effective use of data for business purposes.
Complexity:
Building a decentralised architecture that is controlled through a centralised governance model can be a complex endeavour. It requires careful planning and implementation to ensure that the governance model effectively manages the decentralised architecture without stifling its benefits. One of the main challenges is ensuring that the governance model is flexible enough to accommodate the diverse needs of different domains while maintaining overall coherence and standardisation. Additionally, effective communication and collaboration between domain teams and the central governance group are essential to ensure that the decentralised architecture is aligned with the organisation's goals and objectives. However, despite its complexities, a well-designed decentralised architecture with effective centralised governance can promote greater agility, collaboration, and innovation within an organisation, which can ultimately lead to better business outcomes.
My forthcoming article will provide a comparison between Data Mesh architecture and the more traditional data warehouse and data lake architectures. The article will explore the following topics:
Join a growing community of 1650+ Data Enthusiasts by subscribing to my ????????????????????:?Data Architecture History
Senior AI & Data Strategy Manager | Deloitte Consulting
1 年Definitely super hot in the industry????To dive even deeper on?#DataMesh,?#DataFabric?and?#HybridDataManagementcheck out the full day Digital Leaders Meetup 2.0 in Berlin on July 13th with industry leaders from Zalando, Freenow, Baywa R.E., Uniper, Deutsche Bahn, Symrise, Signa Sports and further. --> Event Flyer: https://bit.ly/DigLeaders2Flyer? ?? Participate in engaging think-tanks and a panel discussion. Gain actionable insights for #DataDriven transformation and connect to your peers in our network dinner after ?? Ping me or Sebastian Kunert if you're interested in joining! #DigitalLeadersMeetup