Data Lake or Data White Water Rafting? (Part 2)
(Chapter 3 of?Data Lakes For Dummies?is titled?Break Out the Life Vests: Tackling Data Lake Challenges. Here's a 2nd post that includes some of the highlights and key takeaways from that chapter, in bite-sized excerpts.)
The data lake is dead!
Well, that's a strange uttering from someone who just wrote a Dummies book about data lakes! Here's my excuse: it's not me saying it, I'm just quoting...well, the tale gets complicated from here.
Teradata has been around for a long while. In June of 2019, they published a blog post entitled "The Data Lake is Dead; Long Live the Data Lake!" The punchline of the post was, basically, that Hadoop had proven complex to use and manage, and that data lakes built on top of Hadoop were "swimming against the tide." So essentially, if you read between the lines, they weren't necessarily knocking the concept of a data lake, but rather the Hadoop underpinning.
However, by 2019, not a whole lot of modern data lakes were being built on top of Hadoop. Far more were being built on either the AWS stack (with heavy usage of S3) or the Azure stack, with ADLS Gen2 at the epicenter. Amazon databases such as Redshift were often in the mix, as were Azure data management capabilities such as Cosmos DB.
So maybe you're tempted to tag that particular blog as a touch of marketing-speak; i.e., the "legacy" version of a data lake had hit a brick wall so it was time to repurpose the concept.
But then you get to a similarly sounding blog and accompanying video from earlier this year (March, 2021) entitled "The data lake is dead; long live the data mesh."
Data mesh?
If you've stayed on top of the world of analytical data management for a while now, you've not only seen the world of data warehousing evolve into the world of big data - and then, by extension, into data lakes - but now you have an entirely new generation of data this-or-that solutions: data mesh; data fabric; even the data lakehouse.
So what the heck is going on here?
领英推荐
Basically, we're seeing the 1990s-era ROLAP-vs.-MOLAP wars, as well as the epic battles between Inmon and Kimball aficionados in the world of data warehousing, playing out once again. I spent a little bit of time and writing space in Data Lakes For Dummies covering the aforementioned data lake "cousins" but they all come back to one critical point that transcends turf wars, conflicting definitions, and all of the rest of the noise:
We are trying to achieve high-value, architecturally evolvable, and cost-effective enterprise-scale analytical data management.
Did first-generation data warehousing fall short in some (or many) ways from its original promise? Sure. Did first-generation big data also fall short in some ways from its original promise? Once again: yep. Do modern data lakes built on top of either the AWS or Azure stacks have challenges? Well, yeah.
Can we continue to evolve the solution space architecturally and also through next-generation underlying technology? Definitely!
So if you want to change the name of what your company is building from a "data lake" to a "data lakehouse" or "data mesh" or something else, go right ahead. Data warehouses circa 2021 are dramatically different than several generations of ancestors; but data warehouses are still around. Think about a massive data warehouse built using SAP HANA. Would you not agree that 1995ish or even 2010ish data warehousing limitations might well be in our rear view mirror?
A data lake, circa 2031, will be far more robust and ruggedized than its late 2010s or even 2021 ancestors. So in my opinion, it's a little early to take position that "the data lake is dead...on to the next shiny object."
It's all about enterprise-scale analytical data management, and no matter what term you decide to use, it's very difficult work!
__________________
Alan Simon is the Managing Principal of Thinking Helmet, Inc., a boutique management and technology strategy consultancy specializing in analytical business process management, business intelligence/analytics, and enterprise-scale data management.?
Alan is the author or co-author of 32 business and technology books, dating back to 1985, including the just-published?Data Lakes For Dummies?(Wiley).