Efficient Data Domains Organization
Intro
In this writing, we discussed the reasons behind Data Mesh, and we defined some key elements. One of the key elements for the pattern is the decomposition of enterprise data across Data Domains.
As we discussed, Data Domains should represent a container for related data products. The key question now is: which is the right way to build Data Domains?
?
Data Domains Theory?
Data Mesh theory defines?three kinds of Data Domains:
Since aggregate data domains are used only in very specific scenario, they will be discussed in a future writing.
?
Basically, when we work with source-aligned data domains, we stay close to the source of the information. When we work with consumer-aligned data domains, we stay close to the business consumer, trying to fulfil its business needs.
Typically, when we design the analytical layer of the company application landscape, we tend to aggregate data into business entities, often trying to stay as far as possible to the specificity of our operational systems. With this in mind, source-aligned data domains may seem a weird choice. Everything will be clearer when we will discuss of the reference technological landscape for the data mesh pattern, but, for now, just think:
So, given the goal to harmonize operational system data management across the organization to data domains and data products definition, source-aligned data domains make sense.
?
Generally speaking:
?
Data Domains and the enterprise organization?
As we know, Data Domains will incorporate Data Products, and each Data product will be managed by Data Owner. Since the focus is on the ownership, a possible approach is to rely on the enterprise organization to identify Data Domains. In this respect, there are two possible approaches:
?
In the first scenario, Data Domains will be designed over the enterprise organizational structure and every identifiable group of people in the organization can represent a specific data domain:
?
In the Contoso Corp example shown in the picture above, each organizational unit can become a Data Domain, so we can have for example “Office 1”, “Department 2”, “Division 1” data domains, according to the specific use case. With this approach, there will be a direct and clear correspondence between Data Products, Data Owners, and the organizational structure. At the same time, we will know that the data products belonging to a given Data Domain, will realise value for their owning organizational unit.
?
On the other hand, mapping Data Domains with logical functions implies to leverage on cross-unit functions for their definition. For example, you can map Data Domains to projects: they can involve people from different organizational units since often resources with cross-functional skills are required.
?
With this approach, Data Domains are built around the definition of project deliverables, and they have no relation with the organization structure or specific business functions; on the contrary these Data Domains can be spread across different business functions, resulting in having Data Owners coming from different units:
Projects are just an example of cross-functional items: Data Domains can be built over meaningful areas of information; “Customer management” can be a Data Domain, that involves several organizations in the enterprise (commercial, accounting, delivery, product management, etc..).
A good way to distribute Data Domains could be to leverage on the data ownership: is the ownership related to the organizational structure of the enterprise? If so, perhaps the wisest choice would be to go with the organizational design. If the data ownership is mostly related to the outcome projects or to cross functional knowledge, then it would be wise to go with cross-unit Data Domains.
Even if everything seems to be straightforward as of now, as always reality is far more complex than that. Consider this scenario:
?
Even though it can seem straightforward to use a cross-unit approach since:
From a project point of view, it is consistent to have “project-driven Data Domains”. Nevertheless, if we look at the situation from a Data Owner point of view, we have this situation:
?
Even if we simplified the data production process, when we look at the data consumption from the data business meaning perspective, it has become inefficient.
This is likely to be a scenario in which having both the approaches in place could be wise.
?
One of the main advantages of this approach is that we are decoupling the ingestion from the data transformation. Data is coming into the system (and we will see in a few writing that exactly the “system” is) preserving its original shape and meaning. According to the business scenario data is then transformed, grouped, merged, filtered to solve the specific need and this transformation is always non disruptive.
In the next writing we will discuss this topic: is there any technological pre-requisite to adopt the Data Mesh approach? In general, what is required from an organization to successfully implement the pattern?
?
?
?
?
?