登录查看更多内容

Modern Data Architecture

Emad Yowakim

Senior Manager - Big Data & AI Analytics @ Deloitte

发布日期: 2021年12月23日

What is Data Mesh?

Over the last couple of years, the data mesh architecture has emerged as a new framework to help solve many of the challenges that have plagued organizations, especially as they’ve scaled their data and data teams and tried to deliver more value, faster. Removing these barriers to data and delivering value at scale is a lofty goal. As with any architectural pattern, succeeding with a data mesh is not simply a technology problem to solve; it’s also about having the right technology to set up your teams for success and even catalyze change throughout your organization.?

The Four Principles of a Data Mesh

The idea of a data mesh was a reaction to the trade-offs organizations were being forced to make as they scaled their data into less-governed and less-structured monolithic data lakes. As the number of data sources and data consumers grew, so did the number of data pipelines needed to connect them all. This pushed more and more of the work burden onto specialized teams who had the skills to develop for these notoriously challenging technologies but were disconnected from the domain experts who needed the data to do their jobs. This led to the all-too-common scenario of downstream data consumers waiting on complex pipelines and loosely stitched-together technologies to get the data they needed, and it also led to overworked engineering teams trying to keep up with demand.?

Figure 1, from?Data Mesh Principles and Logical Architecture , shows the four core principles that define a data mesh architecture:

Domain-driven ownership?
Data as a product
Self-service infrastructure
Federated governance

Principle 1: Domain-driven ownership and architecture

The first principle of a data mesh is shifting the power of data and ownership into the hands of the domain teams. They own the data end to end—from ensuring they have the right sources or ingested data to work with, to building and maintaining any processing pipelines necessary, to serving the data out for other domain teams to tap into as products (more on that later) with the right quality guarantees and governance controls in place. The domain teams can be defined by department, business unit, or other similarly motivated groupings and, if they are properly implemented, new domain teams should be able to be added fluidly especially when data is being correlated into new data products.

Principle 2: Data as a product

As alluded to in the first principle, domain teams aren’t just responsible for the data; they are also responsible for the resulting data products. And data products need to be treated like any other product. Data products need to be discovered and usable by consumers and other domain teams, and the domain owner is responsible for maintaining and updating (or deprecating) these products to ensure quality and accuracy. What can this look like in practice? Imagine a supply chain team creating an inventory data product that a marketing team can tap into to develop new discount campaigns or that can be used by regional teams for placing new orders.

Principle 3: Self-service infrastructure as a platform

The third principle is to make all this self-service and easy for the domain teams. Complex technologies and niche skills are simply not sustainable in a data mesh design. There needs to be a common platform and set of tools that any domain team can tap into at any time to build and serve their data products, without getting bogged down in infrastructure maintenance or resource limitations.?

Andrew C. Madson 3 个月前

Data Fabric Architecture

Andre Ripla PgCert 1 个月前

What goes into bronze, silver, and gold layers of a…

Valliappa Lakshmanan 3 周前

Principle 4: Federated governance

The final piece of a successful data mesh is governance. A data mesh architecture cannot come at the expense of access controls and data protections. There needs to be a balance between having global governance policies and controls, and ensuring each domain team maintains the ability to define and implement these policies when developing and sharing their data products. This federated governance is critical not only for ensuring data privacy and compliance but also for aiding discovery at scale.?

?Data Mesh Success

Connecting organizations and data teams to the most relevant data when they need it, without silos or complexity,

Delivering self-service infrastructure as a platform

Building a self-service infrastructure is the most obvious data mesh principle where the right technology can help. It’s critical that domain teams can access the resources and tools they need on demand to support them at every stage of the data product lifecycle—from accessing the right data, to processing and preparing it, to analyzing it or creating models.

Delivering domain-driven ownership and data as a product

This last concept of scalable, dedicated resources has allowed Snowflake customers to implement a distributed domain-driven design logically, while maintaining a standard central platform backing it all. This central platform can incorporate a wide range of data types and file formats, and even support access to external data for comprehensive coverage of the data landscape. And as a fully managed service with built-in automations, the central platform makes it easy for domain teams to self-serve. IT teams don’t need to worry about provisioning, maintenance, upgrades, or downtimes. And domain teams operate as distinct units that can scale to practically any number of users who can work with virtually any amount of data on demand, with no infrastructure-expertise or tuning required.

However, even with this design, a data mesh still runs the risk of turning into a bunch of domain silos. And silos are the killer of any organization.

Delivering federated governance

Within Data Mesh are all of the native cross-cloud governance controls that act as the foundational building blocks for enabling federated governance. Organizations can strike the right balance between allowing domain owners to easily define and apply their own fine-grained policies and having centrally managed governance processes. Policies can be defined at the data and role level, and they follow the data for consistent enforcement—even as data is shared between clouds, regions, or workloads. Domain teams can discover and query the same data, and their resulting views change based on their role and the data sensitivity, drastically simplifying governance at scale while still allowing teams to get value from their data. Organizations can also integrate these governance controls with their existing governance and catalog standards, such as Alation, to further enhance quality, discoverability, and data protection across their domain teams.?

Kumar Chinnakali

a bit in the Computer Science Community.

2 年

Excellent piece Emad, thanks a ton.

1 次回应

Ahmed Elshikh

Business Intelligence Senior Specialist

2 年

Mohammed Sofyan

2 次回应

查看更多评论

要查看或添加评论，请登录

Emad Yowakim的更多文章

What is the difference between Big O, Big Omega, and Theta notation?

2023年9月19日

What is the difference between Big O, Big Omega, and Theta notation?

Let me start by describing the asymptotic running time. When we study algorithms, we are interested in characterizing…
What is Apache Spark?

2023年2月23日

What is Apache Spark?

Apache Spark is an open-source big data processing framework designed for fast and efficient processing of large-scale…
Why Large Models are the future of Machine Learning?

2023年2月6日

Why Large Models are the future of Machine Learning?

There are many large language models available, developed by different organizations and used for various tasks in…
Why is correlation analysis the initial step of understanding your Data?

2023年1月10日

Why is correlation analysis the initial step of understanding your Data?

Every company has – or should have – a series of key performance indicators (KPIs) or, simply said, targets that they…
The measure of Central Tendency

2023年1月4日

The measure of Central Tendency

There are three main measures of central tendency: mean, median, and mode. The mean is the arithmetic average of a set…
Why is synthetic data a must-have and essential for the future of AI?

2022年11月24日

Why is synthetic data a must-have and essential for the future of AI?

Why synthetic data is essential for Organizations? Synthetic data is expected to completely replace real data in AI…

1 条评论
Difference Between Parquet and CSV

2021年12月7日

Difference Between Parquet and CSV

Difference Between Parquet and CSV CSV is a simple and widely spread format that is used by many tools such as Excel…
Master Data Management vs. Data Warehousing

2021年6月22日

Master Data Management vs. Data Warehousing

What is Master Data Management Master Data Management (MDM) refers to the process of creating and managing data that an…

4 条评论
Why Oracle Autonomous Database is the Future?

2020年11月26日

Why Oracle Autonomous Database is the Future?

Oracle Autonomous Database Reduce operational costs by up to 90% with a multi-model converged database and machine…
AI is revolutionizing digital marketing

2020年11月2日

AI is revolutionizing digital marketing

AI is revolutionizing digital marketing, and whether marketers believe it or remain skeptical, the future of AI in…

See all articles

Modern Data Architecture

Emad Yowakim

Senior Manager - Big Data & AI Analytics @ Deloitte

What is Data Mesh?

The Four Principles of a Data Mesh

领英推荐

Emad Yowakim的更多文章

社区洞察

其他会员也浏览了

Hyper-Scalable Data Architectures: Unleashing the Value of Your Data

ELEMENTS OF DATA ARCHITECTURE

Previewing Chapter 3: Principles of Data Mesh Architecture

Data Architecture

Previewing Chapter 4: The Patterns of Data Mesh Architecture

Importance Of Data-Centric Architecture In Business

The Databricks Data Lakehouse

The 6-Step Data Architecture Shift Framework (6-DASF): Building a Case for Evolving Your Data Architecture

Data Architecture and its Significance

Unveiling Star Architecture: A Blueprint for Efficient Data Warehousing

What is Data Mesh?

The Four Principles of a Data Mesh

领英推荐

Emad Yowakim的更多文章

What is the difference between Big O, Big Omega, and Theta notation?

What is Apache Spark?

Why Large Models are the future of Machine Learning?

Why is correlation analysis the initial step of understanding your Data?

The measure of Central Tendency

Why is synthetic data a must-have and essential for the future of AI?

Difference Between Parquet and CSV

Master Data Management vs. Data Warehousing

Why Oracle Autonomous Database is the Future?

AI is revolutionizing digital marketing

社区洞察

其他会员也浏览了

Hyper-Scalable Data Architectures: Unleashing the Value of Your Data

ELEMENTS OF DATA ARCHITECTURE

Previewing Chapter 3: Principles of Data Mesh Architecture

Data Architecture

Previewing Chapter 4: The Patterns of Data Mesh Architecture

Importance Of Data-Centric Architecture In Business

The Databricks Data Lakehouse

The 6-Step Data Architecture Shift Framework (6-DASF): Building a Case for Evolving Your Data Architecture

Data Architecture and its Significance

Unveiling Star Architecture: A Blueprint for Efficient Data Warehousing