You Can’t Have Data Mesh Without Governance
Photo Credit: Uriel SC via Unsplash

You Can’t Have Data Mesh Without Governance

As the volume of big data keeps growing, traditional tools used to manage it can fall short. Demand for data science led many organizations to try combining their data warehouses with big data tools. The challenge is that trying to deploy data this way can cause huge backlogs, especially in large organizations. If a single team owns both the data platform and all integrations, other teams that need analytics can lose time while they wait for their results.

Even if the data team owns all infrastructure, the volume of data they work with is often larger than what many business intelligence (BI) tools can handle. The number of data sources is often numerous as well. This situation makes a strong case for not having one team manage all data in one platform. Unless your organization is small, this option is bound to cause silos, backlogs, and lost productivity.

Using?data mesh?— an architectural pattern that lets cross-functional teams manage data domains as products — can?ease these risks.

Data Mesh and Data Products

Not sure what a?data product?is? You already work with them today. If your head of Sales keeps your company’s purchase order history in a JSON file, that JSON file a data product. Your data team can automate the file upload process so that the data refreshes daily and the latest data lives in a specific location within your cloud architecture. Data products might also be published datasets that live on someone’s laptop, or machine learning models that predict various costs, from shipping dates to marketing campaigns. Data products are not new inventions that your stewards must make — they’re an improved way of managing the data you have now.


Done well, data mesh lets teams access, develop, and manage data autonomously.


Data Mesh Principles

Zhamak Dehghani first described the data mesh concept in 2019, which promotes?four principles:

  • Data ownership is domain-specific.?Data mesh architecture requires data stewards who own specific data domains and lead communication about distributed data. Rather than leaving all the work of receiving, aggregating, and cleaning data to one data scientist, data stewards own this process for their respective domains. Data stewards ensure that their data is ready for use by your BI analytics tools of choice, thus preparing the clean data for use by your colleagues and customers.
  • Data is a product.?This is a mindset shift for many data practitioners. Rather than managing data as a service, data mesh takes a product attitude towards data management. When data stewards manage their data as products, they’re able to use it in more diverse ways. For instance, rather than using data about customer interactions in that single business context, you can use customer data in a range of ways that benefit your business.
  • Data available as a self-service infrastructure.?Data mesh architecture keeps all domain data on a central platform that manages storage, streaming, pipelines, and more. This arrangement prevents data from living in disparate systems and eases the need to build integrations, APIs, and so on. Each data steward is able to manage their domain data from the same source.
  • Ecosystem governance.?Governance is a core tenant of successful data mesh. Governance reinforces your data framework and mission statement, ensuring that all data is formatted, standardized, and discovered against equal standards. These standards give everyone assurance that the data they use is controlled for quality.

A diagram with words explaining the data mesh concept. It emphasizes that the data governance council owns decisions about governance and standards, while data engineers own data infrastructure. (Catalog, storage, access, and pipelines.)

Moving Towards a Data-Driven Culture

This move to distributed architecture through a shared sense of ownership is one way to execute data governance. Before you can use this architectural technique, you must have clearly defined cross-functional data domains and assign each domain to stewards who own it. You also must ensure that your data platform allows domain experts to use the tools, techniques, and dashboards that serve their audiences, without depending on just one tool or team.

While data mesh is an ideal way to practice data governance, it is not the right architecture for all teams. If your organization and technical team are small, data mesh might make your work more complex. But if your organization has independent business units, autonomous teams that work independently, and data/analytics needs across these units/teams, data mesh is worth a look.

Done well, data mesh lets teams access, develop, and manage data autonomously. It also gives your data stewardship team an easier way to keep your data secure. But there’s a catch! Data mesh works only if you’ve done the hard work to create a data-driven culture and can automate that culture’s standards.

About the Author

Lauren Maffeo is a service designer at Steampunk and a member of the Technology Advisory Council at the UK Information Commissioner’s Office (ICO). Her first book,?Designing Data Governance from the Ground Up, is available in beta through The Pragmatic Programmers.

Until November 28th, 2022, you can save 40 percent on the ebook when you used promo code?turkeysale2022?at checkout. Promo codes are not valid on prior purchases.

要查看或添加评论,请登录