A Three-Step Framework for Implementing a Hybrid Data Mesh
Pradeep Menon
Creating impact through Technology | Data & AI Technologist| Cloud Computing | Design Thinking | Blogger | Public Speaker | Published Author | Active Startup Mentor | Generative AI Evangelist | Board Member | Web3
The last blog of this series, The Data Mesh and the Hub-Spoke: A Macro Pattern for Scaling Analytics, focused on the conceptual implementation of the hybrid mesh. The blog started by establishing the need for these macro patterns. Then it discussed the fundamental concepts of the domain and the governance-flexibility spectrum. After discussing these concepts, we discussed the conceptual architecture of hub-spoke and data mesh patterns. Finally, the blog concluded with a framework to place a domain within hub-spoke or a data mesh pattern to form the hybrid mesh. This blog is the second part of the hybrid mesh series. This blog will focus on the three-step process for implementing a hybrid mesh. First, let us begin with a recap.
A Recap
The previous blog of this series introduced the concept of a hybrid mesh in the previous blog, The Data Mesh and the Hub-Spoke: A Macro Pattern for Scaling Analytics. The following diagram revisits the conceptual architecture of a hybrid mesh.
The hybrid mesh infuses the concepts of both hub-spoke and the classic data mesh pattern. The hybrid mesh is a more pragmatic approach for implementing the concept, as organizations are not simplistic entities.
Large organizations are evolving organically and are complex; hence, a hybrid approach works best.
Let us now focus on the steps an organization may take to implement a hybrid data mesh.
The Three-step Framework
The hybrid mesh implementation is a complex endeavor. For a fruitful implementation of this concept, there needs to be a confluence of technical excellence and organizational discipline. The three steps discussed below provide framework organizations can consider for implementing the hybrid mesh.
Each of the steps in the framework strives to answer a series of questions that provides better clarity on the step's objective. Let us have a look at this framework in detail.
Step 1: Define Domain
The step defining the domain strives to answer the following questions:??
In the previous blog of this series, an organizational domain was defined. Let us recap that definition.
A domain is any logical grouping of organizational units that aims to fulfill a functional context subjected to organizational constraints.
Typical examples of domains are:
Once the domain is defined, the next step is to determine the functionality of the domain node.
Step 2: Determine Domain Node
The step of determining the domain nodes strives to answer the following questions:
The previous blog briefly skirted the idea of the domain node. Each domain requires technical capabilities that need to be addressed.
A domain node fulfills the technical capabilities of a domain.
As an example, for fulfilling the technical ability of a decision support system, a node can have components like an Operational Data Store (ODS), a Data warehouse, a Data Lake, or a Data Lakehouse, along with its peripheral components like data ingestion, data processing, machine learning, etc. The following figure depicts the potential components of a domain node.
The flexible components are those technical components that can be implemented based on the needs of the domain. The flexible components are tailored to the domain's requirements. For example, a sophisticated domain can have a Data Lakehouse to fulfill its decision support requirement. In addition, it can be armed with sophisticated AI/ML components that extract maximum value from the underlying data. Another example can include a less sophisticated domain focusing only on reporting systems to cater to its decision support requirements.
On the other hand, the must-have components, as the name suggests, are required to fulfill the essence of a hybrid data mesh. These three components ensure three key aspects of a hybrid data mesh:
Once the domain node is defined, and its components are well established, the next step is establishing key roles and responsibilities that ensure governance in the hybrid mesh.
Step 3: Establishing Governance Framework
The step of establishing the governance framework strives to answer the following questions:
Data governance is a significant topic. A holistic data governance framework encompasses the governance objectives, policies, and components that materialize data governance. In the context of hybrid data mesh, the governance framework has three key aspects.
领英推荐
Let us discuss each one of them in some depth.
Roles and Responsibilities:
Establishing roles and responsibilities for a hybrid data mesh is an arduous task. The traditional technical roles, like data engineers, data scientists, developers, project managers, etc., are given for any technical data implementations. However, successful implementation of a hybrid data mesh demands creating roles that ensure proper governance. The five key roles that make it happen are:
The next aspect of data governance is data cataloging.
Data Cataloging:
One of the pivotal components of a hybrid mesh implementation is its data cataloging service.
Data cataloging is organizing the inventory of available data so that they can be easily identified and used.
This service ensures that all the source data, the data in the hub and spoke domain node, and the outputs extracted from domain nodes are appropriately cataloged. Think of data cataloging services like Facebook of Data. It is a place to get visual information on the domain's contents. One can get information about the data, the relationships between the data, and the lineage of transformations the data has gone through. Some of the elements that one can consider for cataloging are depicted in the following diagram:
The next aspect of data governance is data sharing.
Data Sharing:
Data sharing between the domains needs to be structured, governed, and secure. Recall the discussion on the governance-flexibility spectrum in the previous blog. A refresher diagram can be found below:
The degree of relative domain independence determines how independent a domain is compared to other domains.
Five parameters determine the relative domain independence:
?The placement of the domain in this spectrum determines whether a domain is a candidate for hub-spoke architecture or a data mesh architecture.
In a hybrid mesh, data sharing can occur in two flavors:
Let us investigate each of these scenarios in detail.
Data Sharing Between Hub-Spoke Domains:
The first scenario is the data sharing between a hub domain and a spoke domain.
In this scenario, the spoke domain is dependent on the hub domain for key aspects of data. The diagram below depicts the data sharing workflow between the hub and the spoke.
Let us elaborate on the steps:
The workflow of a reverse scenario, i.e., a hub domain requesting data from a spoke domain, is depicted in the figure below:
Now that the workflow of data sharing between the hub and spoke is clarified let us investigate a scenario where data needs to be shared between two independent domains.
Data Sharing Between Data Mesh Domains:
The data sharing with the data mesh domains is slightly different as each domain has the independence and control of what data it can catalog and share. The diagram below depicts the workflow of data sharing between the domains of a data mesh.
Conclusion
A hybrid data mesh is a Macro Architecture pattern for harnessing data across multiple domains. The first part of this blog series focused on the conceptual underpinning of a hybrid data mesh. Next, this blog delves into its logical constructs. Next, it focuses on the logical components of a domain node, data cataloging strategy, key roles and responsibilities, and data sharing workflows. Finally, the next part of this series will focus on the technical implementation of this concept on a cloud computing platform like Microsoft Azure.
Dreaming, doubting and definitely doing…
2 年Love it Pradeep Menon — Dev Nadgir very complementary to the work!
Director, Azure AI (ANZ, ASEAN, Korea) at Microsoft
2 年Rajat R. Sameer Parve Manprit Singh