Episode 1- A Gentle Intro to Data Mesh World

Episode 1- A Gentle Intro to Data Mesh World

Introduction:

I am Beshoy Gamal, Bigdata and Machine Learning geek, I have worked on implementing data-driven solutions?for more than 9 years in cross countries in the world and cross technologies from on-primes and cloud, and now I am working in Vodafone Group as Senior Data Architect.

From my all experience I found that many organizations have invested in a central data lake and a data team with the expectation to drive their business based on data. However, after a few initial quick wins, they notice that?the central data team often becomes a bottleneck, as they cannot handle all the analytical questions of management and product owners quickly enough

So I have decided to write these series of articles about the DataMesh, Data Product, Selfe Services, and Data Democratization

Why You May Need a Data Mesh

The central data team is a massive problem because making timely data-driven decisions is crucial to stay competitive. For example: Is it a good idea to offer free shipping during Black Week? Do customers accept longer but more reliable shipping times? How does a product page change influence the checkout and returns rate?

The data team wants to answer all those questions quickly. In practice, however, they struggle because they need to spend too much time fixing broken data pipelines after operational database changes. In their little time remaining,?the data team has to discover and understand the necessary domain data. For every question, they need to learn domain knowledge to give meaningful insights. Getting the required domain expertise is a daunting task.

No alt text provided for this image


On the other hand, organizations have also invested in domain-driven design, autonomous domain teams (also known as stream-aligned teams or product teams) and a decentralized microservice architecture. These?domain teams own and know their domain, including the information needs of the business. They design, build, and run their web applications and APIs on their own. Despite knowing the domain and the relevant information needs, the domain teams have to reach out to the overloaded central data team to get the necessary data-driven insights.

With the eventual growth of the organization, the situation of the domain teams and the central data team becomes worse. A way out of this is to shift the responsibility for data from the central data team to the domain teams. This is the core idea behind the data mesh concept:?Domain-oriented decentralization for analytical data. A data mesh architecture enables domain teams to perform cross-domain data analysis on their own and interconnects data, similar to APIs in a microservice architecture.

What Is Data Mesh?

The term?data mesh?was coined by?Zhamak Dehghani??in 2019 and is based on four fundamental principles that bundle well-known concepts:

The?domain ownership?principle mandates the domain teams to take responsibility for their data. According to this principle, analytical data should be composed around domains, similar to the team boundaries aligning with the system’s bounded context. Following the domain-driven distributed architecture, analytical and operational data ownership is moved to the domain teams, away from the central data team.

The?data as a product?principle projects a product thinking philosophy onto analytical data. This principle means that there are consumers for the data beyond the domain. The domain team is responsible for satisfying the needs of other domains by providing high-quality data. Basically, domain data should be treated as any other public API.

The idea behind the?self-serve data infrastructure platform?is to adopt platform thinking to data infrastructure. A dedicated data platform team provides domain-agnostic functionality, tools, and systems to build, execute, and maintain interoperable data products for all domains. With its platform, the data platform team enables domain teams to seamlessly consume and create data products.

The?federated governance?principle achieves interoperability of all data products through standardization, which is promoted through the whole data mesh by the governance group. The main goal of federated governance is to create a data ecosystem with adherence to the organizational rules and industry regulations.

No alt text provided for this image

Very interesting topic Beshoy ! Very well done !

要查看或添加评论,请登录

Beshoy Gamal的更多文章

社区洞察

其他会员也浏览了