A Lakehouse is not open just because the table format, but compute services and catalogs also should enable interoperability and openness to a wider and dynamic data ecosystem. Repost Dipankar Mazumdar, M.Sc ??. This page isn't affiliated with the Apache Iceberg project and doesn’t represent PMC opinions. For official news, please check the communication channels provided by the project: https://lnkd.in/dQ76H72K
Staff Data Engineer Advocate @Onehouse.ai | Apache Hudi, Iceberg Contributor | Author of "Engineering Lakehouses"
Open Lakehouse Architecture - But what's really open? There’s a noticeable shift in how users/customers are now consistently thinking about data architectures. You hear about terms like 'Open lakehouse'. But what does “open” really mean? This is not something easy to define. Still, it looks like there is a general agreement on one core idea. Data should reside as an 'open and independent' tier, allowing all compatible compute engines to operate on that “single copy” based on workloads. The most important thing to understand here is that just replacing proprietary storage formats with 'open table formats' doesn’t automatically make everything open and interoperable. In reality, customers end up choosing a particular open table format (based on vendor support), while being tied to proprietary services & tools for things like optimization, maintenance, among others. This confusion is created by the growing use of jargons like "open data lakehouse" and "open table formats”. And no, this is not about build vs buy! You can still buy vendor solutions while maintaining an open & interoperable platform. The 'key' is that when new workloads arise, you should be able to integrate other tools or seamlessly switch between compute platforms. I sought out to answer some of the questions that have been in my mind in this blog (in comments). This is from the perspective of having worked with the 3 table formats (Apache Hudi, Apache Iceberg & Delta Lake) for the past couple years of my career. Some questions that I ask: ? What are the differences between an open table format & an open data lakehouse platform? ? Is an open table format enough to realize a truly open data architecture? ? How seamlessly can we move across different platforms today? Would love to hear any thoughts! #dataengineering #softwareengineering