WHAT A DATA WAREHOUSE IS NOT

No alt text provided for this image

WHAT A DATA WAREHOUSE IS NOT

By W H Inmon

For a long time now vendors and consultants have been trying to hitch a ride on the data warehouse train. Vendors say to themselves – my product looks a little like a data warehouse. Do you think that anyone would notice if I told them it was a data warehouse when it really isn’t? We can play upon the success of data warehousing to sell our product. Brilliant marketing idea.

This charade has been going on for a long time now. It is time to put some integrity back into the vendors and the consultants. (Kind of like pushing a rope. But you have to try!)

So what is a data warehouse? From the very beginning a data warehouse was –

??Subject oriented

??Integrated

??Non volatile

??Time variant

Information in support of management’s decisions.

So what have people tried to foist off as a data warehouse –

1) data marts and dimensional modelling. A data mart and dimensional modelling do not require that you integrate data. You just build your star joins directly from the applications and don’t bother with the messy integration stuff. (I recommend dimensional modelling to people all the time. I ask them the question – what are you looking for – believable corporate data or a fast report? If people are just looking for fast reports, then the dimensional model is for them. And a lot of people are doing exactly that – looking for fast reports. But building reports fast is not the same thing as building a foundation of believable data.)

2) Big Data. Yes, data warehouse requires a lot of storage. No doubt about that. Data warehouses typically store a decade’s worth of data (or more). So data warehouses consume more data than most people have ever imagined. But solving the storage problem is only one aspect of data warehousing. Simply storing data in large masses is not building a data warehouse.

3) Online update in a data warehouse. One of the features of data warehousing is that data is non volatile. This means that you don’t change data values once the data has been placed in the data warehouse. So online transaction update processing is not something that you do in a data warehouse. (Side note – look up ODS – operational data store if you are interested in this subject.) You certainly do processing in a data warehouse, just not online transaction update processing. OLTP is meant for clerical and customer support, not management support, in any case.

4) data lakes. Simply plopping data into a data lake is not a data warehouse. A data warehouse requires the integration of data and placing data into a data lake is not the same thing at all as integrating data into a data warehouse. At best a data lake is a staging area. At best. A data lake quickly turns into a data swamp. Quickly.

5) Placing data on the cloud. What happens when you place a big mess on the cloud? You have a big mess on the cloud. Placing data on the cloud does nothing magical in terms of solving your problems of accessing and analyzing corporate data. Vendors that tell people that something magical happens when they put data on the cloud are not telling the truth.

6) ELT rather than ETL. Vendors love ELT. Vendors are good at E and they are good at L, but when it comes to T, vendors say that that is in the hands of the user. And guess what? The end user magically forgets to do T. So, ELT turns into just EL. And without T you have no data warehouse.

7) lets all do data science. The data scientist is above all of this messy data warehouse stuff. Let’s just hire a bunch of expensive consultants, turn them loose, and then we don’t need a data warehouse. What a brilliant idea. And what does the data scientist do when he/she comes on the job? The data scientist spends 98% of their time wrangling with data.

All of these red herrings by vendors have sounded something like data warehousing, but they are not. And what happens when these approaches ultimately fail? Data warehouse gets the blame. And in fact, these approaches were not data warehousing at all.

So, when will data warehousing finally die? Data warehousing will die when people don’t need believable, credible data on which to make decisions. When that day comes people will no longer need a data warehouse.

But don’t hold your breath…..

Bill Inmon lives in Denver with his wife and his dog, Jeb. Jeb gets his morning cookie without fail. If Bill forgets the morning cookie, Jeb faithfully reminds him. Jeb is not bashful.

Jerry W.

-Database: Architect & Modeler -Data Warehouse: Architect & Modeler -Data Integration Architect & Modeller SKILLS: Business: System: Analysis, Data & Process Modeling Enterprise Level: Analysis, Data & Process Modeling

1 年

Love this, "What happens when you place a big mess on the cloud? You have a big mess on the cloud" In Real Estate the slogan is "Location, Location, Location." In Data Warehousing it should be, "Integration, Integration, Integration." Sadly, this is not the case.

回复
Danie Minnaar

Senior Data Architect at InfoBuild (Pty) Ltd. | Expert in Data Warehousing

2 年

Amen!

回复
Avinash Murali

Business Intelligence | Data Architecture | Data Engineering | Enterprise Reporting | Mentoring

2 年

Excellent post Bill Inmon! Nice mention of Data Lakes quickly turning into Data Swamps.. coz literally there are a lot of the swamps around than lakes or warehouses mainly because of over-simplification of dimensional modeling and a notion of simply dumping data into one place and connecting reports to this "swamp"!

Larry Burns

Data and BI Architect at Fortune 500 Manufacturer

2 年

What it comes down to is that everybody wants a shortcut, so they don't have to do the messy, expensive, time-consuming work. And the only real "shortcut" is to design and implement data correctly up-front, which hardly anyone ever does. As the old saying goes, it's a lot easier to stay out (of trouble) than to get out.

回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了