登录查看更多内容

AVOIDING THE HORRIBLE TASK OF INTEGRATING DATA

Bill Inmon

Founder, Chairman, CEO, Best-Selling Author, University of Denver & Scalefree Advisory Board Member

发布日期: 2022年3月24日

+ 关注

AVOIDING THE HORRIBLE TASK OF INTEGRATING DATA

By W H Inmon

For many years now, vendors and consultants have avoided the practice of integrating data. Integrating data involves – complexity, making mistakes, searching through undocumented systems, making assumptions, understanding decisions that were made long ago, etc. Integrating data requires a four letter word no one wants to hear – work. And lots of it.

So how have vendors and consultants avoided integrating data? Vendors and consultants have a long list of excuses for not integrating data –

??We didn’t invent the technology for integrating data so it isn’t worthwhile doing

??We do ELT, not ETL (And we conveniently forget to do T)

??There’s just too much data. We can’t handle that much data

??We are already busy. We don’t have time to do all of that work

??We can’t get our users to agree on anything.

??We can just federate data and use data in place. There is no need to unify or integrate the data

??Integrating data is complex and hard.

领英推荐

Knowledge Graphs and Data Governance

Nicola Askham 3 个月前

"In God we trust. All others must bring data"

Atidiv 1 年前

Active metadata platform as the future of data…

Prukalpa ? 2 年前

??We threw the documentation away (or the documentation never existed)

And the list goes on. Ho hum….

But there is a new excuse that we need to add to this list of excuses for not integrating data. The new excuse is – let’s use data in place. Copying data is expensive and time consuming. Let’s just leave our data where it sits.

This sounds like a reasonable argument, but this line of thinking is anything but reasonable.

First off, when people go to unify and integrate data, the fundamental act they are performing is TRANSFORMING data, not copying data. ?It is true that some amount of copying of data is necessary in the act of transforming data. That is incidental, collateral damage. But when people go to integrate data, a certain amount of copying of data is absolutely necessary and absolutely unavoidable.

The net result of transforming data is the ability to unify and integrate data. Once data is transformed the data can be viewed as a single entity. But without transforming data you cannot have a unified view of data across the organization.

The choice is simple and basic – you either transform data and have a unified view of the data or you don’t transform the data and leave it in place and have a fractured view of your data. This choice is binary. It is one way or the other. It is as simple as being alive or being dead. You can’t have it both ways.

I am aware that the transformation data – integrating data – is difficult. Messy. Time consuming. Complex. Full of guess work. I am aware that the copying of some data has its pitfalls. But that is the price you pay for wanting to look at data in a unified, integrated manner. The choice is binary and simple – either you transform data and look at it in a unified integrated manner or you don’t transform data and you look at your data in a fractured manner. You can have either A or B. But you can’t have both

The good news is that you don’t have to boil the ocean. Not all data needs to be transformed. And certainly, the data that needs transformation can be divided up into phases, based on criticality of need. In fact, most data does not need to be transformed. Only the data that needs to be shared and examined in a unified, integrated manner needs to be transformed. But trying to leave data in place that needs to be transformed for the purpose of having a unified, integrated view of data simply is not a long term winning proposition.

There unfortunately is no compromise in this proposition. You can’t have one foot in the water and another foot on land. NO COMPROMISE. You choose A or B. But you can’t have both.

NO SHORT CUTS.

Bill Inmon lives in Denver, Colorado. Bill’s company – Forest Rim technology – does textual disambiguation. Textual ETL reads text and turns text into a standard data base.

Nadeem Khan

Planner & Implementor | Lawyer | Teacher | Member, Harvard Business Review Advisory Council

2 年

This should be compulsory reading for all management sitting in budget approval meetings!

Gabriel Grünberg

Data Engineer

2 年

Agree wholeheartedly. ETL (especially the T) is the plumbing that makes datascientist able to work. As with any craft it also requires a bit of artistry. Keeping things standardized in a single DWH is hard, ideas like datamesh assumes to create company wide standards for data transformation and assumes every small system will have dedicated data engineers to maintain this structure and results in having to look even harder for relevant data. Its just a terrible idea.

1 次回应

Andrzej Szczechla

2 年

100% agree. In last months I had discussins with a company that planned to build a data lake. They did not want to do the integration in their data lake. Their expectation was to "harmonize" the data. But when I tried to understand what this "harmonization" should be they were not able to answer.

1 次回应

Scott Blanchard

Consulting Engineer & Senior Advisor

2 年

Agree 100%. There is no simple way to do this especially when the applications we use have different definitions for the same named data entities. Part of the transformation is aligning data definitions so integrating data doesn’t result in misuse of the data and poor decision making. I think another challenge is to provide an environment where users can integrate in a low cost way and test the value of the data before integrating all data or before you industrialize and institutionalize the transformation of the data.

1 次回应

George Bullock

Director: Financial Services | Software Product Delivery | Lead Transformational Technology Change | Technical Consultant to the Business & Business Partners

2 年

two questions Bill Inmon 1 - If my one data source is essentially a legacy version of my current data, that is probably a 80% matching rate, would it not make sense to "migrate" as it is essentially a mapping exercise 2 - if you don't integrate the data then you still have the complex job of understanding the data and building a wrapper for it.

查看更多评论

要查看或添加评论，请登录

Bill Inmon的更多文章

STREAMLINING THE EMERGENCY ROOM - TEXTUAL ETL

2025年3月14日

STREAMLINING THE EMERGENCY ROOM - TEXTUAL ETL

STREAMLINING THE EMERGENCY ROOM By W H Inmon The emergency room of the hospital is where people turn to when they have…

2 条评论
THE TEXT MAZE

2025年3月12日

THE TEXT MAZE

THE TEXT MAZE By W H Inmon A really interesting question is – why does text befuddle the computer? The fact that 80% or…

2 条评论
BLAME IT ALL ON GRACE HOPPER

2025年3月9日

BLAME IT ALL ON GRACE HOPPER

BLAME IT ALL ON GRACE HOPPER By W H Inmon One of the more interesting aspects about the world of IT is that IT people…

17 条评论
ASSOCIATIVE RECALL AND REALITY

2025年3月1日

ASSOCIATIVE RECALL AND REALITY

ASSOCIATIVE RECALL AND REALITY By W H Inmon A while back, on a Saturday night, my wife and I were looking for a movie…

7 条评论
A FIRESIDE CHAT WITH BILL INMON

2025年2月28日

A FIRESIDE CHAT WITH BILL INMON

A FIRESIDE CHAT WITH BILL INMON Get Bill’s perspective on your IT organization and its initiatives. Come spend an hour…
MESSAGE TO ELON

2025年2月18日

MESSAGE TO ELON

MESSAGE TO ELON By W H Inmon Yesterday Elon Musk tweeted a message asking if anyone had some innovative ways to improve…

73 条评论
GREAT EXPECTATIONS:WALT DISNEY AND THE PENTAGON

2025年2月10日

GREAT EXPECTATIONS:WALT DISNEY AND THE PENTAGON

GREAT EXPECTATIONS: WALT DISNEY AND THE PENTAGON By W H Inmon Think of all the delight Walt Disney has brought the…

5 条评论
BUILDING THE LLM - PART VI

2025年2月5日

BUILDING THE LLM - PART VI

BUILDING THE LLM – Part VI By W H Inmon The language model is an interesting piece of technology. There are many facets…

3 条评论
BUILDING THE LLM - PART V

2025年2月4日

BUILDING THE LLM - PART V

BUILDING THE LLM – Part V By W H Inmon The generic industry language model has at a minimum three important elements of…

2 条评论
BUILDING THE LLM - PART IV

2025年2月3日

BUILDING THE LLM - PART IV

BUILDING THE LLM – Part IV By W H Inmon The value of a generic industry language model becomes apparent when looking at…

2 条评论

See all articles

AVOIDING THE HORRIBLE TASK OF INTEGRATING DATA

Bill Inmon

Founder, Chairman, CEO, Best-Selling Author, University of Denver & Scalefree Advisory Board Member

领英推荐

Bill Inmon的更多文章

社区洞察

其他会员也浏览了

A Unified Approach to Data Practices

From Chaos to Clarity: The 4 Pillars of a Winning Data Strategy

Data Quality: The Gold Standard for Reliable Business Insights

Data Domain or Data as a Product

How To Choose A Data Extraction Tool

When It Comes to Data, You Should Ask Numentica!

Is Inconsistent Data an Accepted Problem? It Shouldn’t Be.

The Key-Based Data Model: A Comprehensive Approach to Data Structure

Why Data Catalog is important for the organization?

Snowflake Shorts: Monitoring Masking Policies and Tags on Snowsight

领英推荐

Bill Inmon的更多文章

STREAMLINING THE EMERGENCY ROOM - TEXTUAL ETL

THE TEXT MAZE

BLAME IT ALL ON GRACE HOPPER

ASSOCIATIVE RECALL AND REALITY

A FIRESIDE CHAT WITH BILL INMON

MESSAGE TO ELON

GREAT EXPECTATIONS:WALT DISNEY AND THE PENTAGON

BUILDING THE LLM - PART VI

BUILDING THE LLM - PART V

BUILDING THE LLM - PART IV

社区洞察

其他会员也浏览了

A Unified Approach to Data Practices

From Chaos to Clarity: The 4 Pillars of a Winning Data Strategy

Data Quality: The Gold Standard for Reliable Business Insights

Data Domain or Data as a Product

How To Choose A Data Extraction Tool

When It Comes to Data, You Should Ask Numentica!

Is Inconsistent Data an Accepted Problem? It Shouldn’t Be.

The Key-Based Data Model: A Comprehensive Approach to Data Structure

Why Data Catalog is important for the organization?

Snowflake Shorts: Monitoring Masking Policies and Tags on Snowsight