Modern MDM: Moving Beyond Traditional Reference Data
In the world of traditional data management, reference data and master data are treated as different categories of data. Reference data is used to classify or categorize other data, and master data is business-critical data shared by multiple systems, applications, and processes. Conventionally, examples of master data include customer data, product records, and vendor data. Reference data includes code lists, taxonomies, and hierarchies of data, amongst other things.
But times have changed, and the advent of modern Master Data Management (MDM) – an approach that does away with old-fashioned classifications and hierarchies – essentially means that there is little to no difference between reference data and master data anymore. In many ways, reference data is a relic of technology that forced us to denormalize models and treat data in an unnatural way. Everything is considered a lookup today, including what was traditionally thought of as reference data, such as colors, countries, and currencies. In reality, reference data is just master data, and master data is just…data – you get where this is going?
Modern platforms like CluedIn are leading the charge in this evolution, leveraging real-time data streaming, low-code integration, and automatic data normalization. Their approach helps to maintain data integrity and governance while simplifying the entire data management process. CluedIn’s Graph-based architecture allows non-standardized data to be incorporated and corrected automatically, enhancing data consistency and compliance. This means that what was once a complex and error-prone process can now be managed seamlessly and efficiently.
In the same way country names crop up in lots of data sets, we do the same with Domains in general. In the world of Graph (which is pivotal to modern MDM), Entities connect to Entities, not Entities to Properties - as with reference data.
In fact, all arguments to maintain reference data can easily be quashed by the more modern approach. Reference data muddies the water and overcomplicates the MDM discussion. It could even be argued that master data does the same.
If master data is slow-moving, then reference data positively dawdles. Historically, reference data is managed differently because it is very static and rarely changes. Why does that even matter? In classic database design, you don't call tables different things just because of the data they contain, you call them tables.
Metadata that refers to reference data sets may document:
In the world of ontologies, this is no longer needed, and it always makes sense to remove unnecessary steps in a process. Wikipedia is the best example of this. Wikipedia is a web of objects that talk to each other. There is no differentiation between reference and master data, data is data, and objects are objects. A country is a country. A currency is its own thing that has relationships with other objects.
Master data is data that relates to the business entities that provide context for business transactions. Unlike reference data, master data values are not usually limited to predefined domain values. Business rules typically dictate the format and permitted ranges of master data values. Common organizational master data includes data concerning:
In the context of a classic relational database, the idea of having a Countries table denormalized for it to be used to reference other tables sounds like a good idea. However, the future of MDM is widely acknowledged to be based on the Graph. In the Graph world, you do not denormalize to tables, you denormalize to records. With this flexibility, each record can evolve in its own way, providing its own schema, it is not tied into an expected schema that matches all other Countries, for example.
领英推荐
In many ways, the sooner we stop talking about master data, the better. What should we be saying instead? We should be speaking in Domains, that is it. Domains are consumable and understandable by all. As soon as you talk about master data, the first question that usually crops up is "What is considered master data?" Why add that extra layer of complexity? Domains are a key part of MDM but they are in ALL data projects, MDM does not have the monopoly on Domains.
What was considered MDM has completely confused what should be explained very differently. Whether you call it Data Mesh, Data Fabric, or modern MDM, there is definitely a need for SOMETHING to translate all of the data that sits across your business in an easy, scalable, and agile manner. Unfortunately, MDM has traditionally involved extremely tight and rigid demands on data, inherently taking the approach of "nobody change a thing!" Guess what, everyone changed everything - and your upfront, schema-driven, top-down approach didn't work!
The traditional Data Warehouse also promised this, but similar to traditional MDM it leans more towards having rigid domain tables to rule them all.
Managing reference data properly is important to any organization since reference data carries the context of data transactions through its semantic content (code value descriptions, location data, and other contextual information). Reference data can be used to drive business logic that helps execute a business process, designate an application to perform specific actions, or provide meaningful segmentation to analyze transaction data. Also, mapping reference data often requires human judgment, so the need for intervention by business data stewards in the reference data management process cannot be overlooked.
Reference data management was traditionally thought of as important for several reasons.
Reference data:
Organizations with a high demand for data entry, including healthcare, insurance, and government entities, experience significant data quality challenges due to improper coding of reference data values. These errors can be quite costly, in several ways. Additionally, many organizations rely on hundreds of individually developed reference files or tables, and each instance requires updating and periodic quality review. It is a big reason that we see companies still working and managing reference data in Excel! Since most organizations do not have sufficient staff to perform the reference data tasks, these activities may not happen; therefore, the reference data is outdated, causing errors in application performance and data integration.
So where do we go from here? If we look five years into the future, the modern MDM movement will make it clear that reference data is a relic of the past. Reference data is just data, master data is just data. However, just talking about data is still too abstract. The sooner we steer the data discussion towards speaking about domains, the easier it will be to generate insights with our data. Having initiatives to move the needle concerning domains such as Customers, Products, Issues, and Vendors will move companies closer to insight and further away from unnecessary complexity.
Embracing solutions like CluedIn, with its emphasis on real-time data streaming, low-code integration, and automatic data normalization, can significantly streamline this transition. By focusing on domains and leveraging modern MDM practices, organizations can ensure more accurate, consistent, and compliant data management, ultimately leading to better business outcomes.
?