BURYING DATA WAREHOUSE - RIP
Bill Inmon
Founder, Chairman, CEO, Best-Selling Author, University of Denver & Scalefree Advisory Board Member
BURYING THE DATA WAREHOUSE – RIP
By W H Inmon
Data warehouse is the whack a mole of technology. Like the carnival where the mole sticks its head up out of a random hole and you take a whack at it, data warehousing just keeps popping up. Just like the mole in the carnival.
This whack a mole act that data warehouse does is especially impressive because there is no vendor nor any organization behind data warehouse. Data warehouse is supported solely and only by the end user. There is no committee, no company, no organization that sits around and makes decisions about data warehouse. Data warehouse has a life of its own.
So who has been trying to kill data warehouse? Who has been taking swings at the ever reappearing mole that keeps randomly popping out of the hole?
There have been several rather major attempts at exterminating and/or bypassing a data warehouse –
??Dimensional modelling and star joins. Ralph Kimball introduced the idea of a data mart. Ralph stated that you can just build a data mart directly from an application. There was no need for one of those messy and hard to build data warehouses using the Kimball approach
??ETL changed to ELT. The vendors of the world gave us ELT. ELT was a descendant of ETL. The trick with ELT was that you did the E and you did the L, and conveniently forgot to do the T. In doing so you just copied data from one place to the next. There was no need for a data warehouse with ELT.
??Big Data. Big Data came along and proclaimed that with Big Data you didn’t need a data warehouse. Large mainframe vendors, Cloudera, et al said that with Big Data there was no need for a data warehouse. You could just conveniently store your data in Big Data and that was it.
??Data lakes came along and proclaimed that all you needed was a data lake. There was no need to go through all that creepy and complex stuff you have to do with a data warehouse. Just dump all of your data in a data lake and that was what you needed.
??Data mesh/data mash came along and said all you needed to do was just to have some fancy connections of data and that when you did that there was no need for a data warehouse
??Data scientists disdained data warehouse. Data scientists learned all of these statistical algorithms in school then when they got into the real world, they spent 95% of their time wrestling with data. But the data scientists thought that data warehouses were beneath them.
??Squeezing data into a data warehouse. All that a data warehouse was was just a bunch of data squeezed together. And you get your hands dirty when you squeezed your hands together hard.
领英推荐
Some of these efforts were very well funded and very well advertised. Others of these efforts were merely casual efforts. But all of them failed to kill data warehouse.
In fact, some of these efforts architecturally added to data warehouse architecture.
For example, people found that adding data marts to a data warehouse was a very good thing to do. Data marts allowed you to customize data and at the same time have integrity of the data. So, Ralph Kimball’s contribution of data marts and the dimensional model added to a data warehouse and was a valuable thing to do.
And Big Data added a dimension of scalability for a data warehouse that had not existed before. The data in the data warehouse with a low probability of access fit very conveniently in Big Data. The people that promulgated Big Data never saw it that way but that was an unintended positive consequence of Big Data.
The people that pushed for a data lake inadvertently pushed new kinds of data into the data warehouse. With the data lake, analog and IoT data, as well as textual data found its way into the data warehouse.
So, the very people that tried to kill data warehouse ended up expanding data warehouse.
So, what’s the problem with data warehouse? Why do people want to kill data warehouse? There are a lot of issues. But the primary issue is that people don’t want to do the dreaded task of integrating data. Data warehouse requires that data be integrated. And integration is complex, risky, hard to do, imprecise, and requires research. Integrating data requires using your brain and using elbow grease. And vendors and most IT professionals just hate doing that.
Corporations have huge silos of information that cannot talk to each other. These silos are an impediment to doing analytical processing across the enterprise. The ONLY way to break these silos apart is to integrate the data found in them and place the integrated data into a data warehouse. There simply is no other way. No Ifs, And or Buts. But vendors and most IT professionals just don’t have the backbone and/or intellect to integrate the siloed data. So the silos remain and corporate/enterprise data remains an elusive, unreachable goal.
Vendors would rather walk across a bed of fiery red hot coals barefoot than to have to go back and integrate data. The problem is that the major value of a data warehouse is in having a foundation of integrated data.
So here lies data warehouse – RIP – Resilient Information Processing, not Rest In Peace.
Data warehouse lives on despite the best efforts to kill or ignore it.
Bill Inmon lives in Denver with his wife and his dog Jeb today. Jeb is a Scottie and Jeb’s girlfriend is a lhasa apso. Jeb gets all excited when he sees Penelope. She is his size. Jeb barks a lot when he sees Penelope.
Data Strategist, Technical Project Manager | Collaborates with tech & business teams to lead, create, deliver data-based analytic solutions | 20+ years experience in Energy, Retail, Telecom | BA, BI, DW, GIS #opentowork
2 年Yes, I, too, can relate. A big part of the integration also includes data quality, now here comes data governance. That's more messy, roll-up your sleeves / elbow grease work. It gets messy because quality includes rules/standardization, and no one (else) was looking that closely at some of that departmental data in the past. Good stuff. Thanks Bill Inmon
Great article Mr. Inmon!. No one but you could articulate this better. At Bintelligenz we fully subscribe to your thoughts and are proud to be among the IT minority who do not hesitate to use their brains and work hard to develop a product that solves the data integration problem and also delivers a Data Warehouse with an integrated data model in the way. With no need to enter a single line of code. Please feel free to reach out if you would like to discuss ideas with us. It will be our absolute pleasure.
President Director at Soltius Indonesia (Metrodata Group)
2 年Fully agree with Bill's
Data 2 Information
2 年Fully agree - integration is the magic and the rest methods to bring things up. Happy for Jeb ??
Great post!