Modern Data Architecture
What is Data Mesh?
Over the last couple of years, the data mesh architecture has emerged as a new framework to help solve many of the challenges that have plagued organizations, especially as they’ve scaled their data and data teams and tried to deliver more value, faster. Removing these barriers to data and delivering value at scale is a lofty goal. As with any architectural pattern, succeeding with a data mesh is not simply a technology problem to solve; it’s also about having the right technology to set up your teams for success and even catalyze change throughout your organization.?
The Four Principles of a Data Mesh
The idea of a data mesh was a reaction to the trade-offs organizations were being forced to make as they scaled their data into less-governed and less-structured monolithic data lakes. As the number of data sources and data consumers grew, so did the number of data pipelines needed to connect them all. This pushed more and more of the work burden onto specialized teams who had the skills to develop for these notoriously challenging technologies but were disconnected from the domain experts who needed the data to do their jobs. This led to the all-too-common scenario of downstream data consumers waiting on complex pipelines and loosely stitched-together technologies to get the data they needed, and it also led to overworked engineering teams trying to keep up with demand.?
Figure 1, from?Data Mesh Principles and Logical Architecture , shows the four core principles that define a data mesh architecture:
Principle 1: Domain-driven ownership and architecture
The first principle of a data mesh is shifting the power of data and ownership into the hands of the domain teams. They own the data end to end—from ensuring they have the right sources or ingested data to work with, to building and maintaining any processing pipelines necessary, to serving the data out for other domain teams to tap into as products (more on that later) with the right quality guarantees and governance controls in place. The domain teams can be defined by department, business unit, or other similarly motivated groupings and, if they are properly implemented, new domain teams should be able to be added fluidly especially when data is being correlated into new data products.
Principle 2: Data as a product
As alluded to in the first principle, domain teams aren’t just responsible for the data; they are also responsible for the resulting data products. And data products need to be treated like any other product. Data products need to be discovered and usable by consumers and other domain teams, and the domain owner is responsible for maintaining and updating (or deprecating) these products to ensure quality and accuracy. What can this look like in practice? Imagine a supply chain team creating an inventory data product that a marketing team can tap into to develop new discount campaigns or that can be used by regional teams for placing new orders.
Principle 3: Self-service infrastructure as a platform
The third principle is to make all this self-service and easy for the domain teams. Complex technologies and niche skills are simply not sustainable in a data mesh design. There needs to be a common platform and set of tools that any domain team can tap into at any time to build and serve their data products, without getting bogged down in infrastructure maintenance or resource limitations.?
领英推荐
Principle 4: Federated governance
The final piece of a successful data mesh is governance. A data mesh architecture cannot come at the expense of access controls and data protections. There needs to be a balance between having global governance policies and controls, and ensuring each domain team maintains the ability to define and implement these policies when developing and sharing their data products. This federated governance is critical not only for ensuring data privacy and compliance but also for aiding discovery at scale.?
?Data Mesh Success
Connecting organizations and data teams to the most relevant data when they need it, without silos or complexity,
Delivering self-service infrastructure as a platform
Building a self-service infrastructure is the most obvious data mesh principle where the right technology can help. It’s critical that domain teams can access the resources and tools they need on demand to support them at every stage of the data product lifecycle—from accessing the right data, to processing and preparing it, to analyzing it or creating models.
Delivering domain-driven ownership and data as a product
This last concept of scalable, dedicated resources has allowed Snowflake customers to implement a distributed domain-driven design logically, while maintaining a standard central platform backing it all. This central platform can incorporate a wide range of data types and file formats, and even support access to external data for comprehensive coverage of the data landscape. And as a fully managed service with built-in automations, the central platform makes it easy for domain teams to self-serve. IT teams don’t need to worry about provisioning, maintenance, upgrades, or downtimes. And domain teams operate as distinct units that can scale to practically any number of users who can work with virtually any amount of data on demand, with no infrastructure-expertise or tuning required.
However, even with this design, a data mesh still runs the risk of turning into a bunch of domain silos. And silos are the killer of any organization.
Delivering federated governance
Within Data Mesh are all of the native cross-cloud governance controls that act as the foundational building blocks for enabling federated governance. Organizations can strike the right balance between allowing domain owners to easily define and apply their own fine-grained policies and having centrally managed governance processes. Policies can be defined at the data and role level, and they follow the data for consistent enforcement—even as data is shared between clouds, regions, or workloads. Domain teams can discover and query the same data, and their resulting views change based on their role and the data sensitivity, drastically simplifying governance at scale while still allowing teams to get value from their data. Organizations can also integrate these governance controls with their existing governance and catalog standards, such as Alation, to further enhance quality, discoverability, and data protection across their domain teams.?
a bit in the Computer Science Community.
2 年Excellent piece Emad, thanks a ton.
Business Intelligence Senior Specialist
2 年Mohammed Sofyan