AVOIDING THE HORRIBLE TASK OF INTEGRATING DATA

No alt text provided for this image

AVOIDING THE HORRIBLE TASK OF INTEGRATING DATA

By W H Inmon

For many years now, vendors and consultants have avoided the practice of integrating data. Integrating data involves – complexity, making mistakes, searching through undocumented systems, making assumptions, understanding decisions that were made long ago, etc. Integrating data requires a four letter word no one wants to hear – work. And lots of it.

So how have vendors and consultants avoided integrating data? Vendors and consultants have a long list of excuses for not integrating data –

??We didn’t invent the technology for integrating data so it isn’t worthwhile doing

??We do ELT, not ETL (And we conveniently forget to do T)

??There’s just too much data. We can’t handle that much data

??We are already busy. We don’t have time to do all of that work

??We can’t get our users to agree on anything.

??We can just federate data and use data in place. There is no need to unify or integrate the data

??Integrating data is complex and hard.

??We threw the documentation away (or the documentation never existed)

And the list goes on. Ho hum….

But there is a new excuse that we need to add to this list of excuses for not integrating data. The new excuse is – let’s use data in place. Copying data is expensive and time consuming. Let’s just leave our data where it sits.

This sounds like a reasonable argument, but this line of thinking is anything but reasonable.

First off, when people go to unify and integrate data, the fundamental act they are performing is TRANSFORMING data, not copying data. ?It is true that some amount of copying of data is necessary in the act of transforming data. That is incidental, collateral damage. But when people go to integrate data, a certain amount of copying of data is absolutely necessary and absolutely unavoidable.

The net result of transforming data is the ability to unify and integrate data. Once data is transformed the data can be viewed as a single entity. But without transforming data you cannot have a unified view of data across the organization.

The choice is simple and basic – you either transform data and have a unified view of the data or you don’t transform the data and leave it in place and have a fractured view of your data. This choice is binary. It is one way or the other. It is as simple as being alive or being dead. You can’t have it both ways.

I am aware that the transformation data – integrating data – is difficult. Messy. Time consuming. Complex. Full of guess work. I am aware that the copying of some data has its pitfalls. But that is the price you pay for wanting to look at data in a unified, integrated manner. The choice is binary and simple – either you transform data and look at it in a unified integrated manner or you don’t transform data and you look at your data in a fractured manner. You can have either A or B. But you can’t have both

The good news is that you don’t have to boil the ocean. Not all data needs to be transformed. And certainly, the data that needs transformation can be divided up into phases, based on criticality of need. In fact, most data does not need to be transformed. Only the data that needs to be shared and examined in a unified, integrated manner needs to be transformed. But trying to leave data in place that needs to be transformed for the purpose of having a unified, integrated view of data simply is not a long term winning proposition.

There unfortunately is no compromise in this proposition. You can’t have one foot in the water and another foot on land. NO COMPROMISE. You choose A or B. But you can’t have both.

NO SHORT CUTS.

Bill Inmon lives in Denver, Colorado. Bill’s company – Forest Rim technology – does textual disambiguation. Textual ETL reads text and turns text into a standard data base.

Nadeem Khan

Planner & Implementor | Lawyer | Teacher | Member, Harvard Business Review Advisory Council

2 年

This should be compulsory reading for all management sitting in budget approval meetings!

回复

Agree wholeheartedly. ETL (especially the T) is the plumbing that makes datascientist able to work. As with any craft it also requires a bit of artistry. Keeping things standardized in a single DWH is hard, ideas like datamesh assumes to create company wide standards for data transformation and assumes every small system will have dedicated data engineers to maintain this structure and results in having to look even harder for relevant data. Its just a terrible idea.

100% agree. In last months I had discussins with a company that planned to build a data lake. They did not want to do the integration in their data lake. Their expectation was to "harmonize" the data. But when I tried to understand what this "harmonization" should be they were not able to answer.

Scott Blanchard

Consulting Engineer & Senior Advisor

2 年

Agree 100%. There is no simple way to do this especially when the applications we use have different definitions for the same named data entities. Part of the transformation is aligning data definitions so integrating data doesn’t result in misuse of the data and poor decision making. I think another challenge is to provide an environment where users can integrate in a low cost way and test the value of the data before integrating all data or before you industrialize and institutionalize the transformation of the data.

George Bullock

Director: Financial Services | Software Product Delivery | Lead Transformational Technology Change | Technical Consultant to the Business & Business Partners

2 年

two questions Bill Inmon 1 - If my one data source is essentially a legacy version of my current data, that is probably a 80% matching rate, would it not make sense to "migrate" as it is essentially a mapping exercise 2 - if you don't integrate the data then you still have the complex job of understanding the data and building a wrapper for it.

回复

要查看或添加评论,请登录

Bill Inmon的更多文章

  • STREAMLINING THE EMERGENCY ROOM - TEXTUAL ETL

    STREAMLINING THE EMERGENCY ROOM - TEXTUAL ETL

    STREAMLINING THE EMERGENCY ROOM By W H Inmon The emergency room of the hospital is where people turn to when they have…

    2 条评论
  • THE TEXT MAZE

    THE TEXT MAZE

    THE TEXT MAZE By W H Inmon A really interesting question is – why does text befuddle the computer? The fact that 80% or…

    2 条评论
  • BLAME IT ALL ON GRACE HOPPER

    BLAME IT ALL ON GRACE HOPPER

    BLAME IT ALL ON GRACE HOPPER By W H Inmon One of the more interesting aspects about the world of IT is that IT people…

    17 条评论
  • ASSOCIATIVE RECALL AND REALITY

    ASSOCIATIVE RECALL AND REALITY

    ASSOCIATIVE RECALL AND REALITY By W H Inmon A while back, on a Saturday night, my wife and I were looking for a movie…

    7 条评论
  • A FIRESIDE CHAT WITH BILL INMON

    A FIRESIDE CHAT WITH BILL INMON

    A FIRESIDE CHAT WITH BILL INMON Get Bill’s perspective on your IT organization and its initiatives. Come spend an hour…

  • MESSAGE TO ELON

    MESSAGE TO ELON

    MESSAGE TO ELON By W H Inmon Yesterday Elon Musk tweeted a message asking if anyone had some innovative ways to improve…

    73 条评论
  • GREAT EXPECTATIONS:WALT DISNEY AND THE PENTAGON

    GREAT EXPECTATIONS:WALT DISNEY AND THE PENTAGON

    GREAT EXPECTATIONS: WALT DISNEY AND THE PENTAGON By W H Inmon Think of all the delight Walt Disney has brought the…

    5 条评论
  • BUILDING THE LLM - PART VI

    BUILDING THE LLM - PART VI

    BUILDING THE LLM – Part VI By W H Inmon The language model is an interesting piece of technology. There are many facets…

    3 条评论
  • BUILDING THE LLM - PART V

    BUILDING THE LLM - PART V

    BUILDING THE LLM – Part V By W H Inmon The generic industry language model has at a minimum three important elements of…

    2 条评论
  • BUILDING THE LLM - PART IV

    BUILDING THE LLM - PART IV

    BUILDING THE LLM – Part IV By W H Inmon The value of a generic industry language model becomes apparent when looking at…

    2 条评论

社区洞察

其他会员也浏览了