AVOIDING THE HORRIBLE TASK OF INTEGRATING DATA
Bill Inmon
Founder, Chairman, CEO, Best-Selling Author, University of Denver & Scalefree Advisory Board Member
AVOIDING THE HORRIBLE TASK OF INTEGRATING DATA
By W H Inmon
For many years now, vendors and consultants have avoided the practice of integrating data. Integrating data involves – complexity, making mistakes, searching through undocumented systems, making assumptions, understanding decisions that were made long ago, etc. Integrating data requires a four letter word no one wants to hear – work. And lots of it.
So how have vendors and consultants avoided integrating data? Vendors and consultants have a long list of excuses for not integrating data –
??We didn’t invent the technology for integrating data so it isn’t worthwhile doing
??We do ELT, not ETL (And we conveniently forget to do T)
??There’s just too much data. We can’t handle that much data
??We are already busy. We don’t have time to do all of that work
??We can’t get our users to agree on anything.
??We can just federate data and use data in place. There is no need to unify or integrate the data
??Integrating data is complex and hard.
领英推荐
??We threw the documentation away (or the documentation never existed)
And the list goes on. Ho hum….
But there is a new excuse that we need to add to this list of excuses for not integrating data. The new excuse is – let’s use data in place. Copying data is expensive and time consuming. Let’s just leave our data where it sits.
This sounds like a reasonable argument, but this line of thinking is anything but reasonable.
First off, when people go to unify and integrate data, the fundamental act they are performing is TRANSFORMING data, not copying data. ?It is true that some amount of copying of data is necessary in the act of transforming data. That is incidental, collateral damage. But when people go to integrate data, a certain amount of copying of data is absolutely necessary and absolutely unavoidable.
The net result of transforming data is the ability to unify and integrate data. Once data is transformed the data can be viewed as a single entity. But without transforming data you cannot have a unified view of data across the organization.
The choice is simple and basic – you either transform data and have a unified view of the data or you don’t transform the data and leave it in place and have a fractured view of your data. This choice is binary. It is one way or the other. It is as simple as being alive or being dead. You can’t have it both ways.
I am aware that the transformation data – integrating data – is difficult. Messy. Time consuming. Complex. Full of guess work. I am aware that the copying of some data has its pitfalls. But that is the price you pay for wanting to look at data in a unified, integrated manner. The choice is binary and simple – either you transform data and look at it in a unified integrated manner or you don’t transform data and you look at your data in a fractured manner. You can have either A or B. But you can’t have both
The good news is that you don’t have to boil the ocean. Not all data needs to be transformed. And certainly, the data that needs transformation can be divided up into phases, based on criticality of need. In fact, most data does not need to be transformed. Only the data that needs to be shared and examined in a unified, integrated manner needs to be transformed. But trying to leave data in place that needs to be transformed for the purpose of having a unified, integrated view of data simply is not a long term winning proposition.
There unfortunately is no compromise in this proposition. You can’t have one foot in the water and another foot on land. NO COMPROMISE. You choose A or B. But you can’t have both.
NO SHORT CUTS.
Bill Inmon lives in Denver, Colorado. Bill’s company – Forest Rim technology – does textual disambiguation. Textual ETL reads text and turns text into a standard data base.
Planner & Implementor | Lawyer | Teacher | Member, Harvard Business Review Advisory Council
2 年This should be compulsory reading for all management sitting in budget approval meetings!
Data Engineer
2 年Agree wholeheartedly. ETL (especially the T) is the plumbing that makes datascientist able to work. As with any craft it also requires a bit of artistry. Keeping things standardized in a single DWH is hard, ideas like datamesh assumes to create company wide standards for data transformation and assumes every small system will have dedicated data engineers to maintain this structure and results in having to look even harder for relevant data. Its just a terrible idea.
100% agree. In last months I had discussins with a company that planned to build a data lake. They did not want to do the integration in their data lake. Their expectation was to "harmonize" the data. But when I tried to understand what this "harmonization" should be they were not able to answer.
Consulting Engineer & Senior Advisor
2 年Agree 100%. There is no simple way to do this especially when the applications we use have different definitions for the same named data entities. Part of the transformation is aligning data definitions so integrating data doesn’t result in misuse of the data and poor decision making. I think another challenge is to provide an environment where users can integrate in a low cost way and test the value of the data before integrating all data or before you industrialize and institutionalize the transformation of the data.
Director: Financial Services | Software Product Delivery | Lead Transformational Technology Change | Technical Consultant to the Business & Business Partners
2 年two questions Bill Inmon 1 - If my one data source is essentially a legacy version of my current data, that is probably a 80% matching rate, would it not make sense to "migrate" as it is essentially a mapping exercise 2 - if you don't integrate the data then you still have the complex job of understanding the data and building a wrapper for it.