登录查看更多内容

Modern MDM: Moving Beyond Traditional Reference Data

CluedIn

Get the trusted data you need to drive the business outcomes that matter.

发布日期: 2024年7月4日

In the world of traditional data management, reference data and master data are treated as different categories of data. Reference data is used to classify or categorize other data, and master data is business-critical data shared by multiple systems, applications, and processes. Conventionally, examples of master data include customer data, product records, and vendor data. Reference data includes code lists, taxonomies, and hierarchies of data, amongst other things.

But times have changed, and the advent of modern Master Data Management (MDM) – an approach that does away with old-fashioned classifications and hierarchies – essentially means that there is little to no difference between reference data and master data anymore. In many ways, reference data is a relic of technology that forced us to denormalize models and treat data in an unnatural way. Everything is considered a lookup today, including what was traditionally thought of as reference data, such as colors, countries, and currencies. In reality, reference data is just master data, and master data is just…data – you get where this is going?

Modern platforms like CluedIn are leading the charge in this evolution, leveraging real-time data streaming, low-code integration, and automatic data normalization. Their approach helps to maintain data integrity and governance while simplifying the entire data management process. CluedIn’s Graph-based architecture allows non-standardized data to be incorporated and corrected automatically, enhancing data consistency and compliance. This means that what was once a complex and error-prone process can now be managed seamlessly and efficiently.

In the same way country names crop up in lots of data sets, we do the same with Domains in general. In the world of Graph (which is pivotal to modern MDM), Entities connect to Entities, not Entities to Properties - as with reference data.

In fact, all arguments to maintain reference data can easily be quashed by the more modern approach. Reference data muddies the water and overcomplicates the MDM discussion. It could even be argued that master data does the same.

If master data is slow-moving, then reference data positively dawdles. Historically, reference data is managed differently because it is very static and rarely changes. Why does that even matter? In classic database design, you don't call tables different things just because of the data they contain, you call them tables.

Metadata that refers to reference data sets may document:

The meaning and purpose of each reference data value domain
The reference tables and databases where the reference data appears
The source of the data in each table
The version of the reference data that is currently available
When the reference data was last updated
Maintenance description for the reference data
Business data stewardship information for the reference data

In the world of ontologies, this is no longer needed, and it always makes sense to remove unnecessary steps in a process. Wikipedia is the best example of this. Wikipedia is a web of objects that talk to each other. There is no differentiation between reference and master data, data is data, and objects are objects. A country is a country. A currency is its own thing that has relationships with other objects.

Master data is data that relates to the business entities that provide context for business transactions. Unlike reference data, master data values are not usually limited to predefined domain values. Business rules typically dictate the format and permitted ranges of master data values. Common organizational master data includes data concerning:

Categories such as individuals, organizations, roles, customers, citizens, patients, vendors, suppliers, business partners, competitors, employees, and students.
Products, internal and external, inventory, and related concepts.
Financial structures, including general ledger accounts, cost centers, profit centers, etc.
Location concepts, for the organizations and individuals and other entities that concern the enterprise.

In the context of a classic relational database, the idea of having a Countries table denormalized for it to be used to reference other tables sounds like a good idea. However, the future of MDM is widely acknowledged to be based on the Graph. In the Graph world, you do not denormalize to tables, you denormalize to records. With this flexibility, each record can evolve in its own way, providing its own schema, it is not tied into an expected schema that matches all other Countries, for example.

Emergent Africa 5 个月前

Data Management, Ask This:

Gerardus Blokdyk 2 年前

Pro Tips (1-50)

Jose Almeida 3 个月前

In many ways, the sooner we stop talking about master data, the better. What should we be saying instead? We should be speaking in Domains, that is it. Domains are consumable and understandable by all. As soon as you talk about master data, the first question that usually crops up is "What is considered master data?" Why add that extra layer of complexity? Domains are a key part of MDM but they are in ALL data projects, MDM does not have the monopoly on Domains.

What was considered MDM has completely confused what should be explained very differently. Whether you call it Data Mesh, Data Fabric, or modern MDM, there is definitely a need for SOMETHING to translate all of the data that sits across your business in an easy, scalable, and agile manner. Unfortunately, MDM has traditionally involved extremely tight and rigid demands on data, inherently taking the approach of "nobody change a thing!" Guess what, everyone changed everything - and your upfront, schema-driven, top-down approach didn't work!

The traditional Data Warehouse also promised this, but similar to traditional MDM it leans more towards having rigid domain tables to rule them all.

Managing reference data properly is important to any organization since reference data carries the context of data transactions through its semantic content (code value descriptions, location data, and other contextual information). Reference data can be used to drive business logic that helps execute a business process, designate an application to perform specific actions, or provide meaningful segmentation to analyze transaction data. Also, mapping reference data often requires human judgment, so the need for intervention by business data stewards in the reference data management process cannot be overlooked.

Reference data management was traditionally thought of as important for several reasons.

Reference data:

Describes the structures used in the organization (internal department codes, internal product codes, employee organization codes, internal location codes, etc.)
Describes the common data used in organizations that are external but connected to the organization (e.g., geographical, currency, country, diagnosis coding structures)
Provides assistance and support to analytics and business intelligence (e.g., classification codes).

Organizations with a high demand for data entry, including healthcare, insurance, and government entities, experience significant data quality challenges due to improper coding of reference data values. These errors can be quite costly, in several ways. Additionally, many organizations rely on hundreds of individually developed reference files or tables, and each instance requires updating and periodic quality review. It is a big reason that we see companies still working and managing reference data in Excel! Since most organizations do not have sufficient staff to perform the reference data tasks, these activities may not happen; therefore, the reference data is outdated, causing errors in application performance and data integration.

So where do we go from here? If we look five years into the future, the modern MDM movement will make it clear that reference data is a relic of the past. Reference data is just data, master data is just data. However, just talking about data is still too abstract. The sooner we steer the data discussion towards speaking about domains, the easier it will be to generate insights with our data. Having initiatives to move the needle concerning domains such as Customers, Products, Issues, and Vendors will move companies closer to insight and further away from unnecessary complexity.

Embracing solutions like CluedIn, with its emphasis on real-time data streaming, low-code integration, and automatic data normalization, can significantly streamline this transition. By focusing on domains and leveraging modern MDM practices, organizations can ensure more accurate, consistent, and compliant data management, ultimately leading to better business outcomes.

Modern MDM: Moving Beyond Traditional Reference Data

CluedIn

Get the trusted data you need to drive the business outcomes that matter.

领英推荐

更多精彩文章

社区洞察

其他会员也浏览了

What is Master Data Management?

Implementing a multidomain MDM strategy: Risks and limitations. One of the round tables at the Data Management Summit 2025

May 2024 (Part 5)

Explaining Master and Reference Data

Data Governance: A crucial link to Corporate Data Management and Analytics Strategies!

What is Master Data Management (MDM) and Why It Matters

Master Data - Breakthrough

10 Reasons for Cloud-native Platform and Data Transformation RoI Failure

Enterprise-Level Preparation for Master Data Management

Generated Data Model

领英推荐

CluedIn announces new innovations and expanded capabilities across the Microsoft ecosystem

2024年9月25日

Elevating Data Governance with Microsoft Purview and CluedIn

2024年7月17日

The Rise of Connected Data and the Connected Enterprise

2024年4月29日

Two sides of the same coin: Data Governance and Master Data Management explained

2024年4月15日

Augmented Data Quality: the what, why, when and how

2024年2月26日

The importance of Master Data Management for Analytics

2023年3月13日

Microsoft Purview and CluedIn: a persuasive data governance partnership

2022年4月21日