DATA LAKE ARCHITECTURE

No alt text provided for this image

DATA LAKE ARCHITECTURE

By W H Inmon

Big Data started out as a replacement for data warehouse. The Big Data vendors are loathe to mention this fact today. But if you were around in the early days of Big Data, one of the central topics that was discussed was – if you have Big Data do you need a data warehouse? From a marketing standpoint Big Data was sold as a replacement to a data warehouse. With Big Data you were free from all that messy stuff that data warehouse architects were doing.

Much to the surprise of the Big Data vendors the support for data warehousing was far, far stronger than they had ever imagined. There were (and still are) valid reasons why data warehouses existed. If you wanted integrated, believable data you needed a data warehouse. Big Data had nothing to say about this aspect of data. The vendor just said – “Buy my product and your problems go away.”

So the Big Data vendors got feedback and pushback. Someone decided that Big Data needed an architectural construct. So the Big Data vendors came up with the Data Lake.

Now data lake was never a real architecture. It was just a buzzword that was used to counter the technicians that had already built a data warehouse.

The data lake was just a big collection of data that was thrown onto the Big Data infrastructure. The theory was that you put the data out there into the data lake and people and data scientists were supposed to find the data and use it to solve previously unknown problems. But it didn’t take long for people to discover that the data lake was really just a glorious data garbage dump. The data sat there and no one used it. No one could use it.

The problem with garbage dumps is that they start to smell over time. Furthermore, the Big Data garbage dump was expensive. So the people that put this non architecture out there were called upon to make their garbage dump useful. The first thing that they discovered was that they needed metadata to describe what was in the data lake. Without metadata they discovered that they couldn’t find any thing.

The next discovery they made is that finding data is not enough. They need data that they can rely upon. They need to create analysis from the data lake and trying to connect data from disparate sources is not easy to do. They discovered that metadata wasn’t the solution. They discovered that metadata only led them to the next step up the ladder. Then after the discovery that metadata is merely the first step up a long ladder, they discovered that in order to make sense of data from one analysis to the next they need to refine the data against a common data model. This is the next step up the ladder.

I don’t know if this path the data lake is walking sounds familiar. But it is the same path that the people doing analysis a long time ago have already discovered. They need a data warehouse architecture. Exactly 180 degrees the opposite of what they promised their buyers years ago. (It is hard for vendors to admit they made a mistake.)

When you bring in the data lake, if you don’t want the data lake turning into a garbage dump, you have got to impose the discipline of the data warehouse architecture over the data lake. Stated differently, the data lake doesn’t solve problems, it merely introduces them.

Yes Big Data and Data Lake enthusiasts – there is such a thing as architecture. There is a need to have integrity of data. Integrity of data just does not magically happen. It requires a lot of work and forethought. And we in the world of data warehousing get to say to you – “I told you so.” Yes we do remember who was so condescending to us in years past. We remember who called us an “old” idea and architecture. We remember the derision that was tossed at us. We remember who sold the IT community on the fact that we weren’t needed. We remember when we were told that we were yesterday’s news and to get lost.

You could have saved a lot of time and energy (and money) by trying to build on the past rather than try to sweep us away. We remember.

The last laugh is really the best laugh. But Big Data and Data Lakes did not need to be the problem child that they are. It is the arrogance of the vendor to blame for the mess that has been made.

Bill Inmon is an author located in Colorado. Bill’s latest books include DATA ARCHITECTURE: SECOND EDITION, Elsevier Press, TURNING TEXT INTO GOLD, Technics Publications, HEARING THE VOICE OF YOUR CUSTOMER, Technics Publications, and THE UNIFIED STAR SCHEMA, Technics Publications. Bill was named by ComputerWorld as one of the ten most influential people in the history of computing. 

Nick Bright

Head of Cloud Data Engineering at Cabot Financial with expertise in BI Consulting

3 年

I think this article disregards the original problem data lakes and software architectures such as Spark tried to solve at the very beginning and I say this having been a champion of Inmon and Kimball methodology for years. 'big data' technologies did *not* start out to replace the Data Warehouse. They were curated to deal with the sheer veracity of data being produced in the modern age. By necessity they had to be designed from the ground up to manipulate, analyse, store, compute terabytes of data in the cloud, on elastic, parallel resource. Although Data Warehousing wasn't the primary problem that needed to be solved, it turns out that you can leverage all the benefits of clustering, parallelisation, cheap storage, open formats, elastic compute and all the other fantastic things to come from Cloud Computing. Nor has data modelling disappeared. It is a necessity now more than ever. A Data Warehouse on top of a Data Lake is not a Data Warehouse without the ever-valuable Inmon/Kimball/Other dimensional modelling in place. I feel, if a Lake smells like a swamp, it's because it's been treated as one.

Jean-Fran?ois Laberge

Co-Founder @ ventriloc | Data Analytics, Intelligent Analytics

3 年

Hi Bill, Great article. Why does an other article from you (in a collaboration with a vendor) says the complete opposite of what you’re writing here? https://databricks.com/blog/2021/05/19/evolution-to-the-data-lakehouse.html Do anything has changed from your perspective about the Lake house? What are those variables? Thanks for your feedback!

Pablo J. Rodríguez C.

Administrador de bases de datos MSSQL/Oracle/Postgres/MySQL/MongoDB en ambientes híbridos

3 年

?Excelente análisis!

回复
Michael Maxfield

Advising Executives on maximizing data, turning ideas into insights. Strategize the art of the possible with data. Leading major data initiatives to ultimate success.

4 年

Foundational truth is irrefutable. Data yields to no one, especially vendors. It would be a nice change for some to drop the idea of replace and take up the idea of build upon.

要查看或添加评论,请登录

Bill Inmon的更多文章

  • STREAMLINING THE EMERGENCY ROOM - TEXTUAL ETL

    STREAMLINING THE EMERGENCY ROOM - TEXTUAL ETL

    STREAMLINING THE EMERGENCY ROOM By W H Inmon The emergency room of the hospital is where people turn to when they have…

    2 条评论
  • THE TEXT MAZE

    THE TEXT MAZE

    THE TEXT MAZE By W H Inmon A really interesting question is – why does text befuddle the computer? The fact that 80% or…

    2 条评论
  • BLAME IT ALL ON GRACE HOPPER

    BLAME IT ALL ON GRACE HOPPER

    BLAME IT ALL ON GRACE HOPPER By W H Inmon One of the more interesting aspects about the world of IT is that IT people…

    17 条评论
  • ASSOCIATIVE RECALL AND REALITY

    ASSOCIATIVE RECALL AND REALITY

    ASSOCIATIVE RECALL AND REALITY By W H Inmon A while back, on a Saturday night, my wife and I were looking for a movie…

    7 条评论
  • A FIRESIDE CHAT WITH BILL INMON

    A FIRESIDE CHAT WITH BILL INMON

    A FIRESIDE CHAT WITH BILL INMON Get Bill’s perspective on your IT organization and its initiatives. Come spend an hour…

  • MESSAGE TO ELON

    MESSAGE TO ELON

    MESSAGE TO ELON By W H Inmon Yesterday Elon Musk tweeted a message asking if anyone had some innovative ways to improve…

    73 条评论
  • GREAT EXPECTATIONS:WALT DISNEY AND THE PENTAGON

    GREAT EXPECTATIONS:WALT DISNEY AND THE PENTAGON

    GREAT EXPECTATIONS: WALT DISNEY AND THE PENTAGON By W H Inmon Think of all the delight Walt Disney has brought the…

    5 条评论
  • BUILDING THE LLM - PART VI

    BUILDING THE LLM - PART VI

    BUILDING THE LLM – Part VI By W H Inmon The language model is an interesting piece of technology. There are many facets…

    3 条评论
  • BUILDING THE LLM - PART V

    BUILDING THE LLM - PART V

    BUILDING THE LLM – Part V By W H Inmon The generic industry language model has at a minimum three important elements of…

    2 条评论
  • BUILDING THE LLM - PART IV

    BUILDING THE LLM - PART IV

    BUILDING THE LLM – Part IV By W H Inmon The value of a generic industry language model becomes apparent when looking at…

    2 条评论

社区洞察

其他会员也浏览了