Data Mesh – think big, start small
Source: Dataversity

Data Mesh – think big, start small

We (myself and Tommie Hallin are planning to post a series of blogs on data mesh. This first one is about how to get started with a data mesh.

We often use the phrase “think big, start small”, while it is for many cases a very good approach, sometimes it is very difficult to apply; for example, when it comes to establishing a Data Mesh in a large organization. Nonetheless, we still believe it is a good approach in this case as well. In this article we will point out some of the key aspects that need to be addressed from the “think big, start small” perspective. You could ask a Generative AI model…and you would get some relevant pointers, but we will go a bit beyond that, pointing out some of the drivers and challenges, picked up by our experience, delivering many transformative data programs in the field.

?If one would use GenAI (e.g. the Llama2 model) and asking the question: How could you "think big and start small" when establishing a data mesh for a large organization?

The summarized answer would be something like:

·????? Start by defining the overall vision

·????? Identify a small, specific use case

·????? Develop a minimal viable product

·????? Work with stakeholders to gather feedback

·????? Gradually expand the data mesh, building on success

·????? Continuously monitor and evaluate

Although we certainly do not argue against any of the suggestions, it does not include some perspectives that we feel are very important.

Data vision and data strategy

Strategic views for the Data Mesh are very important however execution will require alot of capability building and change management across the organization: business, data and IT. Data literacy and embracing a business data driven culture is a cornerstone for a successful adoption of data mesh concepts - e.g. designing business focused data products will require alot of training at all levels starting from end users and up to business executives - Academia and training service providers should design specific programs for this purpose. And these programs need to consider the maturity and focus of the business.

On the other hand, if you ensure that your initial data products are usable and relevant for business - bringing them what they want and need, then it becomes very natural. All companies have a need for the basic data products - like customer, product, sales etc. This is where you should start.

One of the key goals for a data mesh is to decentralize and distribute the management of data to the data domains (ideally owned and driven by business) - this needs to be the starting point. Then the challenge is to manage this data decoupled from the “system of record” - (logically or physically) separate from the application/platform where the data is created/updated/deleted. There are different types of data products (from Raw to Analytical). Raw would be data in the source system format, Curated would be data in a conformed / transformed layer, Analytical would be data in a consumption format (semantic layer).

Data Readiness

Another key enabler for “think big, start small” is data readiness. The discoverability, availability, quality of data is a crucial decision driver when it comes to prioritizing data products to start with. If the data is in bad shape, for example difficult to find or not mastered, ?or of bad quality or difficult to get access to, you may have to do more work upfront before you can really get started. Master and reference data needs more collaboration and involve more IT and Data resources to model and define from an enterprise perspective, but they should still have ownership based in business and be defined as data products to enable a strong re-usability. The concept of defining data as a product helps with putting focus on value, purpose, lifecycle, owership, sharing etc. For data products to be really useful they need to be well described and defined (name, description, owner, quality, usage, status, classification, rating, linage, model etc). While some of this may be existing and can be captured from data sources, it typically involve a lot of manual authoring and reviewing.

So data readiness can be a prohibitor for starting small, but it is also a risk that you get over ambitious in the data readiness work and thereby get unnecessary delays in starting the initiative(s) that drive direct business value of the data mesh.

The breakthrough of GenAI during the past year has created a significant opportunity for data mesh adoption and a mitigating factor for lack of data readiness. Combining GenAI with existing AI driven classification models (available @ many vendors like IBM, Informatica, Collibra … etc) can accelerate building cross domain catalogs necessary for democratizing data products – enabling the “think big, start small” appraoch.

Business users can rely on themselves not only to design new data products but also to build them; having interactive actionable prompts to move, enrich and transform data will empower business teams. GenAI makes it much easier to create and manage the metadata, since it is a lot about natural language - GenAI and LLM’s can be used to find and generate a lot of this information/metadata - accelerating the path to have well described data products that business can understand.

Holistic Data Mesh design

There is a need to focus on the holistic design of the data mesh to ensure that domain teams are not creating data products in silos. This is the “think big” part. All domain leaders need to contribute to the holistic design especially on determining the ownership, overlaps and dependencies between data domains. A business value prioritization framework should be developed to understand the roadmap of implementing data products and use cases that add business value quickly. But the overall “business case” for a data mesh is the possibility to connect across the enterprise, across the data domains - the cross-functional use and consumption of data and AI. A data mesh is not a “mesh” if it does not contain the design for connecting across. If you do not pay attention to this and establish the guidelines and rules for the overall data mesh upfront, you will have significant re-work and delays when you come to the cross-domain/cross-functional use cases. Related to the business value framework is the cultural change of managing data. Setting up an incentive scheme to promote the work of creating re-usable data products and establishing change agents in the organization to promote the new data culture are crucial for the holistic design.

The starting point is laying down the domains at a high level (something like what you do in level 2 process design) i.e. domains and subdomains. This will be sufficient to understand ownerships, dependencies and relationships. Then listing data products in the different subdomains becomes natural – but versioning the domain design is important as this will evolve along the way. It is necessary to incorporate technical and data resources into the domain teams, since there will be many activities that will be difficult for a business person to perform - data stewardship is typically performed by a cross functional team.

A data mesh implementation also requires Integrated technical capabilities that span across many data management capabilities, typically covering data platforms, data engineering, data catalog & governance. Implementing these capabilities might take time and it is very important to plan that implementation in terms of timelines, efforts, costs, dependencies. Although it is fundament to enable the data mesh technically, there is a risk of the data mesh having primarily a technical focus. But if the technology is not ready to support the business users, the whole data mesh concept becomes impossible to implement. Part of “starting small” is all about validating that the data mesh can be put into use and bring value, and without tools and technology, this can not be done. Letting the first data products and use cases drive the prioritization of technical capabilities reduces the risk of getting stuck in a technical focused data mesh.

How to “Think big, start small”

Here are our 8 steps that take you beyond the generic advice you can get from GenAI, for how to “Think big, start small” when establishing a data mesh.

?1.???? Think big about the strategic benefits of data mesh and how it will align with the organization’s business vision, objectives & initiatives while securing the right business sponsorship

2.???? Create a target data architecture reflecting key technical capabilities required to implement various data mesh concepts and considering fit and technical feasibility with existing landscape e.g. distributed data governance, data products design and self service platform for business users.

3.???? Asses the organization’s people and process maturity and create a change management board and change agents network across the organization; this will help in training/enablement and elevating data driven culture; this is an essential part from inception and throughout the delivery of various phases

4.???? Design the high level version of the data mesh (think L2 process description) by listing key domains, subdomains and few data products while identifying business owners; e.g. documenting domains’ relations, dependencies, complexity, data source systems, key data elements with quality indices.

5.???? Select two or three different types of data products, in one domain, to start an MVP after considering the following dimensions: business problem & delivery impact, business sponsorship, data readiness & lineage, technical complexity & adoption KPIs.

6.???? Create a domain team and train them to deliver the MVP in 4-6 weeks (driven by business, mix with technical resources)

7.???? After successful delivery of the MVP (meaning data products used by business, delivering value), collect lessons learned and plan for scaling up by organization re-design, elaborated data mesh design, enhanced technology stack … etc.; keeping change management board and change agents engaged

8.???? Build a 1-2 year roadmap of additional domains, data products and use cases with clear implementation timelines for the key enablers: data architecture, technology stack, data culture elevation and start managing continuous prioritization of that roadmap.

By following this approach, you can "think big" and establish a clear vision for the data mesh, while also "starting small" and testing the concept in a controlled environment before scaling it up to the entire organization. This will help ensure that the data mesh is effective, efficient, and meets the needs of the organization.

Authors: Tommie Hallin and Ahmed Khafagy

要查看或添加评论,请登录

社区洞察

其他会员也浏览了