What’s the Common Data Model, and why you should care

What’s the Common Data Model, and why you should care

I’ve been writing and speaking about the Common Data Mode (CDM) for years now, since the enterprise application integration (EAI) days of the late 1990s. The concept predates EAI, but has it has recently evolved a great deal, and for good reason.

A CDM, at its essence, provides a single source of truth for most and sometimes all of the data that exists in an enterprise. It’s made up of many different databases that may leverage different database models, such as relational and object databases, and many different structures or schemas. However, it appears to all who leverage the CDM as a single unified and abstract database that, when asked a common question, provides a common and consistent answer. 

You would think that after many years of having databases around enterprises, we would have a single source of truth for our data. However, that’s not been the case. As we evolved our systems and databases, there was no concept of one consistent data model that we carried forward from application to application. 

Thus, those charged with building these systems reinvented data model and instance for each new system, and even replicated data and schemas. This translated into redundant database structures that basically represented the same data. 

The end result was multiple places where you could find customer data, inventory data, sales transition data, etc., thus the limitations and confusion around which data was correct. There was no single source of truth. 

While this was an issue for transactional systems, this really became a limitation when enterprises wanted to leverage data analytics. The lacking single source of truth means that data was not consistent. The pull from many different physical data sources, including different database models and structures, meant that the data being analyzed was not consistent with a larger abstract data model that did provide a good representation of the business. 

Thus, the need for a CDM. This provides a consistent representation of the data for all purposes, including analytics, transitional systems, or adhoc queries. 

CDMs have the following attributes:

The CDM is made up of many very different databases. There is no need to force-fit specific database technology, structures, or common approaches. Most enterprises have created these databases over the years, and the likelihood that there is consistency in both technology and schema is very low. The CDM re-represents this data as a single logical, physical and consistent set of metadata. 

There is a common view of metadata as well as data. The notion that we need to provide a single source of truth means that we also have a common set of semantics that exists above the physical data storage. We can look at the abstraction layer versus the underlying data and make a determination of what data to use from that layer. 

CDMs have to leverage technology to create data abstractions. This is really about tools that allow you to paste together these disparate data sources, which allow them to work well together to provide a CDM. Keep in mind that, in essence, we’re tricking the underling physical data to re-represent itself as something it was not made to be. However, this is necessary since we’re looking to fix issues without gutting existing data stores, which would break the thousands of applications that are bound to those data stores. 

No matter what you think of the concept of a CDM, you’re more than likely to need one within your enterprise. The database design and development of the last 20+ years is more loosely coupled and inconsistent. It’s time to bring things together and find some commonality. 


              

Dan L.

Inventor of All Things Data Vault (DV1, DV2, Methodology, Model, Architecture, Implementation and Standards)

5 年

This is not meant to knock David, I hold him in high regard, and appreciate his thoughts on these matters.? This is an opinionated comment, based on personal experience over the last 30 years.? If you have a different experience with common model implementations, I'd love to hear it, always open to learning new things. I don't believe that there is such a thing as "single source of truth" - because truth is subjective to each individual viewing the data.? Their truth changes every time they learn something (hence the demands to change business rules rapidly with a BI tool). Common Models are great Logical and conceptual guides, don't get me wrong, but they leave a lot to be desired at the physical implementation layers.? Why?? Because in order to populate them a company MUST undertake master data, data quality, data alignment in source systems, closing of the gaps, and more...? complete and total "conformity" must be achieved to leverage common models, and the semantic differences, and semantic breaks that are present across source systems lend themselves to projects that lead to self-destructive behaviors every time a company seeks to implement something like this. This was tried for years in the 1990's and through the 2000's, with the "industry logical models", aka: "logical data models" for verticals sold by vendors.? Resulted in the words "Data Warehouse" becoming a bad word in the companies that it was tried. One of the other reasons that this is infeasible physically, or at the implementation level is due to the serious amount of technical debt that the companies have amassed over the years.? By debt I mean the disparities across business processes, data sharing, and broken source system applications.? When a "common model" is often chosen by execs, and then a technical team is told to implement it, the business fails to understand that they just signed up to pay 7x the cost (time and money) to the "data warehouse team" to close the technical debt gaps.? This doesn't even begin to address the business debt.? Nols has some stories and real numbers to back up these claims. We tried this in the 1990's at Lockheed Martin, at that time they had over 250 different source systems, and wanted us (a team of 3 people) to implement an enterprise data warehouse - and this was just for Astronautics (one division), and they gave us 6 months to do it.? We did it, but only for 125 source systems, and we did it in 6 months. Sadly, we tried (before those 6 months) for over a year,? to implement a "common logical data model" - because we subscribed to the same fallacy as was stated here: Single Source Of Truth.? We failed, until we changed our approach, methodology, and implementation strategy. To succeed: we started with: Single source of Facts, we leveraged the common model as a guide (only a guide).? We leveraged an extended ontology and taxonomy combination, we drove out standards, and a common methodology (now known as #datavault) - once we separated the "information / conformity / interpretation" from the act of "data warehousing" we succeeded in the 6 month time window. 2 months after we put the Data Vault in to production, the Business Analysts had created over 5000 of their own reports.? we had huge successes, allowing? them to change their "definitions of what was common for their own lines of business" at their whim.? As a result, they could now measure just how far apart (the gap) the businesses were, and how far apart their perception was from the realities of what was happening in their source systems.? They began closing gaps, and as a result saved hundreds of millions of dollars in a 12 month time period - and it wasn't because of the common model. Anyhow, these are just my 2 cents worth - again, common models have value, if applied logically as a guide (at the right place), or as a guide to producing Master Data. Cindi Meyersohn Bruce McCartney Nols Ebersohn Vincent McBurney Michael Olschimke Eric Axelrod Eric Kavanagh Volker Nürnberg Mark Madsen Mark R. Schultze

回复
Dave Duggal

Founder and CEO @EnterpriseWeb

5 年

It's the abstraction, and it's implementation that really matter - "common" in Common Data Model can't be imposed as in MDM. The abstractions have to provide domain semantics as-a-service, a value-added "pull" not a conformance push/imposition and then allow consumers shape the data for their purpose. GraphQL is too simplistic, assumes someone engineers data structures that themselves are one-off. SemanticWeb and Linked Data also have limited utility as a practical application layer abstraction.

Heimo H?nninen

Business Information Expert

6 年

Like a good wine, it just get better as ages :-) Today, I'd say Linked Open Data along with semantic technology provides the best suited tools to build CDM.? a) Framework explained:?LDIF translates heterogeneous Linked Data from the Web into a clean, local target representation while keeping track of data provenance;?https://ldif.wbsg.de/? b) Example of? commercial product:?https://www.poolparty.biz/unifiedviews/??

Ian Rowlands

Writer (Self-employed)

6 年

I agree, this is an excellent discussion topic. The question is to what extent the CDM is static, and to what extent it's dynamic -- and that might argue for more than one level of abstraction -- model, metamodel, metametamodel ... Another concept that's due for a resurgence. The changes from the nineties to now include the democratization of data and the explosion of variety and velocity (it turns out the volume, though eye-catching, is much less challenging). Those changes drive a need for an openness to more rapid evolution of the CDM.

Ajay Khandelwal

Managing Director (Product and Engineering)| Vision, Strategy and Execution

6 年

That is an excellent conversation topic in today's complex data world. A common data model at conceptual/logical level .. I totally agree. Hence need of business metadata at the enterprise level. You can also have your foundational master data ( customer, product, services etc.. ) as a source of truth in MDM. Those MDM publishing cleaned mastered version down to the point of consumption. With the volume of data and fit for purpose for performance cost/scale it's hard to imagine you will end up having only a single copy of data.

回复

要查看或添加评论,请登录

David Linthicum的更多文章

社区洞察

其他会员也浏览了