Dunith Danushka的动态

查看Dunith Danushka的档案
Dunith Danushka Dunith Danushka是领英影响力人物

Product Marketing at EDB | LinkedIn Top Voice | Writer | Data Educator

???How I’d Learn Apache Iceberg (if I Had To Start Over) Apache Iceberg is everywhere. Major cloud providers and data platform vendors—including Google, Confluent, and Snowflake—have bundled Apache Iceberg support into their managed service offerings, making it an essential skill in every data professional's toolkit, whether you like it or not. Want to learn Iceberg and struggling with where to start? Let me help you here. I've created a comprehensive 7-week study plan that balances Iceberg theoretical concepts with hands-on practice. Though I'm still working through it, I wanted to share my learning roadmap—both to help others grasp the basics and to gather feedback for improvements. Here’s the gist of it. You can find the long version here https://lnkd.in/eS8Kqj94 ??Week 1: Understanding the problem context I will spend the first week studying what led to the creation of Apache Iceberg. Through reading articles, and books, and watching videos, I'll build a mental model of Iceberg and understand why it exists. ??Week 2: What is Iceberg? I will spend the second week trying to understand the architecture of Apache Iceberg—what it is made of and how it works. ??Week 3: Getting hands-on The third week is all about applying everything I’ve learned so far into practice. I will try to set up a local Iceberg environment where I can experiment with basic table-level operations. ??Week 4: Working with Apache Spark, partitioning and time-traveling I will dedicate week 4 to exploring how query engines work with core Iceberg features. I will start with Apache Spark. ??Week 5: Record level operations, version controlling for tables In the fifth week, I will further explore the core Iceberg features using a different query engine and catalog: Dremio and Nessie. ??Week 6: Streaming with Apache Flink, schema evolution Now that I understand Iceberg’s core capabilities, it’s time to explore how Iceberg integrates batch and real-time processing. I will experiment with Apache Flink. ??Week 7: Advanced concepts I will wrap up my study in week 7, focusing on advanced concepts of Iceberg. Even after completing this 7-week schedule, I won't feel fully confident until I apply this knowledge practically. Therefore, I plan to build a real-world data lakehouse project that incorporates batch and real-time data processing, a BI dashboard, and a machine learning use case at the end. I hope this learning plan is helpful. If you're an expert in this field, I welcome your feedback on any topics I may have missed.

要查看或添加评论,请登录