关于我们

Apache Iceberg is a cloud-native and open table format to building Open Data Lakehouses. This page isn't affiliated with the Apache Iceberg project and doesn’t represent PMC opinions. For official news, please check the communication channels provided by the project: https://iceberg.apache.org/community/

网站
https://iceberg.apache.org/
所属行业
软件开发
规模
1 人
总部
California
类型
非营利机构

地点

Apache Iceberg员工

动态

  • 查看Apache Iceberg的公司主页,图片

    19,072 位关注者

    [repost Starburst] This page isn't affiliated with the Apache Iceberg project and doesn’t represent PMC opinions. For official news, please check the communication channels provided by the project: https://lnkd.in/dQ76H72K

    查看Starburst的公司主页,图片

    39,740 位关注者

    Apache Iceberg and Trino are revolutionizing enterprise data lake architectures by tackling cost, scalability, and interoperability challenges. ?? Key Highlights from SiliconANGLE: - Iceberg delivers database-like functionality on object stores for seamless, cross-platform analytics. - Trino, co-created by Starburst’s Dain Sundstrom, powers fast, distributed query performance. - Together, they fuel innovation in modern data stacks like Starburst’s Icehouse, a fully managed Iceberg and Trino data lake. ?? Explore how Starburst is advancing modern data strategies: https://okt.to/JVh6tB #Trino #Iceberg #DataInnovation #OpenSource

    Unlock efficient data processing with Iceberg - SiliconANGLE

    Unlock efficient data processing with Iceberg - SiliconANGLE

    siliconangle.com

  • 查看Apache Iceberg的公司主页,图片

    19,072 位关注者

    Great introduction to cloud-based Lakehouses. Repost Olena Yarychevska This page isn't affiliated with the Apache Iceberg project and doesn’t represent PMC opinions. For official news, please check the communication channels provided by the project: https://lnkd.in/dQ76H72K

    ?? Apache Iceberg Tables in Snowflake! ?? In this brief presentation, I’ve compiled key information about the Apache Iceberg functionality, including main features, setup, and integrations. You’ll find answers to questions like: - What is Apache Iceberg? and its role in data management - Key features, such as ACID transactions, schema evolution, and time-based snapshots - Integrating and setting up Iceberg in Snowflake - Use cases and limitations of Iceberg If you're interested in learning more about this tool or its capabilities - there is cool Tutorial: Create your first Apache Iceberg table ?? https://lnkd.in/dStt-cnJ #ApacheIceberg #Snowflake #DataLake #DataEngineering #CloudStorage #Tech #DataManagement

  • Apache Iceberg转发了

    查看Yingjun Wu的档案,图片

    Founder @ RisingWave. Event-driven architecture, stream processing, databases.

    Apache Iceberg is our consensus.

    查看RisingWave的公司主页,图片

    9,525 位关注者

    ?? GET READY for Iceberg Streaming Analytics Meetup on November 21st?? Join RisingWave, MotherDuck and Confluent for an evening packed with the latest on streaming analytics, Apache Iceberg and modern data architectures ?? ??Agenda: 5:30 PM: Doors open! Grab some food, drink, and mingle 6:00 PM: Kafka Meets Iceberg – Real-time streaming into modern data lakes by Kasun Indrasiri ? 6:30 PM: Iceberg + Postgres Protocol – RisingWave’s Iceberg power-up by Yingjun Wu ?? 7:00 PM: When All You Have is a Hammer: Using SQL in Your Data Lake by Jacob Matson ?? 7:30 PM: Network and hang out with fellow data nerds! ?? Location: MotherDuck HQ, 2811 Fairview Ave E, Suite 2000, Seattle ???Don’t miss out! RSVP now ???https://lu.ma/1xokjeia #DataStreaming #StreamProcesing #RealTimeAnalytics #SQL #Kafka

    • 该图片无替代文字
  • 查看Apache Iceberg的公司主页,图片

    19,072 位关注者

    dbt Labs ?? #ApacheIceberg This page isn't affiliated with the Apache Iceberg project and doesn’t represent PMC opinions. For official news, please check the communication channels provided by the project: https://lnkd.in/dQ76H72K

    查看dbt Labs的公司主页,图片

    98,700 位关注者

    You might be in a data desert if you haven't heard the excitement around Apache Iceberg recently. Apache Iceberg is increasingly becoming the standard for open table formats tailored for large-scale datasets and interoperability between data platforms. But setting up Iceberg isn't straightforward, especially when managing metadata and creating pipelines. That's where dbt comes in. Amy Chen dives deeper into the details in their latest blog post (link in comments), but here’s the snapshot: ???With dbt's new support for Iceberg, you can now seamlessly convert your dbt models to Iceberg. Once you've configured your external storage and platform connections, updating your dbt model configuration is a breeze. ???Supported adapters include Databricks, Snowflake, Apache Spark, Starburst/Trino Software Foundation, and Dremio. Looking ahead, dbt will make it even simpler to adopt Iceberg with features like better access to external catalogs, volume management, data refresh handling, and permission enforcement. Plus, the framework is designed to be flexible—so whether you’re using Iceberg or another table format, dbt has you covered. The ultimate goal? To help data engineers focus on data, not join the storage format wars.

    • 该图片无替代文字
  • 查看Apache Iceberg的公司主页,图片

    19,072 位关注者

    [repost Stéphane Heckel] This page isn't affiliated with the Apache Iceberg project and doesn’t represent PMC opinions. For official news, please check the communication channels provided by the project: https://lnkd.in/dQ76H72K

    The Future of Data is Composable, Portable and Programmable. These interconnected principles are reshaping how organizations handle and leverage their data assets. ???Composability allows organizations to build flexible data infrastructures using modular, interchangeable components. This approach centers on a multi-engine architecture, enabling the integration of core processing engines like Snowflake or Databricks augmented with various use-case-based engines like DuckDB, Trino, Dremio within a complex ecosystem. At the core of this architecture is Apache Iceberg, an open table format serving as a common storage layer. Iceberg provides a consistent foundation accessible by different engines, reducing data duplication and simplifying management processes. Key advantages of composability include: ●?Separation of storage and compute for efficient resource allocation ●?Cost reduction by eliminating data movement between systems ●?Enhanced flexibility in data usage, opening up new possibilities ???Portability focuses on where data resides, embracing a hybrid approach that combines cloud computing benefits with data sovereignty needs. This strategy helps mitigate risks such as cloud dependency, unexpected price increases and vendor lock-in. To implement a portable data strategy, organizations can: ●?Adopt a multi-cloud or hybrid approach ●?Consider open-source alternatives as part of the technology stack ●?Negotiate flexible contracts with cloud providers ?? Programmability extends the principles of composability and portability into automation and code-driven management. This "Everything as Code" approach is now applied to data platforms, orchestration, visualization, and other aspects of the data ecosystem. Key concepts include: ●?Infrastructure as Code: Enables automation of infrastructure creation ●?Platform as Code: Allows management of entire data platforms ●?Orchestration / Automation as Code: Run your pipelines ●?Dataviz as Code: Enables consistent and automated data visualization The future Composable Data Platform should embody all these principles: ●?Modular architecture for mix-and-match of components ●?Flexibility to run on-premises, in the cloud or in hybrid environments ●?Fully programmable setup, from infrastructure to data processing and visualization Interestingly, throughout this exploration of concepts, I have managed to paint the big picture without once resorting to the overused buzzword "Modern." ;-)

    • The Future of Data is Composable, Portable, Programmable
  • 查看Apache Iceberg的公司主页,图片

    19,072 位关注者

    [repost Dipankar Mazumdar, M.Sc ??] This page isn't affiliated with the Apache Iceberg project and doesn’t represent PMC opinions. For official news, please check the communication channels provided by the project: https://lnkd.in/dQ76H72K

    查看Dipankar Mazumdar, M.Sc ??的档案,图片

    Staff Data Engineer Advocate @Onehouse.ai | Apache Hudi, Iceberg Contributor | Author of "Engineering Lakehouses"

    Announcing my new book - "Engineering Lakehouses with Open Table Formats" ?? TBH, I have been thinking about this for quite some time. A lot of times, in conversations with folks exploring table formats, questions have come up around choosing the right table formats, understanding use cases, and designing the overall lakehouse architecture. So, the goal with this book is to provide a comprehensive resource for data/software engineers, architects, and decision-makers to understand the essentials of these formats. But, also to elaborate on some of the less talked about 'core' stuff (beyond marketing jargons). Specifically, the book will target 4 angles: ?? Table format Internals - e.g. How ACID transactions works, What is a Storage Engine, Performance optimization methods, etc. ?? Decisions on selecting a table format - factors to consider from a technical standpoint, ecosystem, features. ?? Use-cases and how to implement - streaming/batch, single-node workloads, CDC, integration with MLFlow, etc. ?? What's happening next - Interoperability (Apache XTable (Incubating), UniForm), Catalogs (Hive to newer ones such as Unity Catalog, Apache Polaris (Incubating)) I’ve been fortunate to have first-hand experience working with open table formats like Apache Iceberg and Apache Hudi primarily, and in some capacity with Delta Lake (circa 2019). And, I intent to bring those experiences and touch upon the intricacies along with some of the pain points of getting started. I am also thrilled to have Vinoth Govindarajan as a co-author, who brings a wealth of experience building lakehouses at scale with these formats at organizations like Uber and Apple. We have drafted the first few chapters, but there's still work to do. We’d love to take this opportunity to learn more from the community about any additional topics of interest for the book. I'll be opening a formal feedback channel in a few days. Oh, and the book is already available for pre-order on Amazon (link in comments). Thanks to Packt for their continuous support in making this a solid effort! #dataengineering #softwareengineering

    • 该图片无替代文字
  • 查看Apache Iceberg的公司主页,图片

    19,072 位关注者

    Cloudera and Snowflake join forces to bring enterprises an open, unified hybrid data Lakehouse, powered by #Iceberg. More here: https://bit.ly/4848s8V This page isn't affiliated with the Apache Iceberg project and doesn’t represent PMC opinions. For official news, please check the communication channels provided by the project: https://lnkd.in/dQ76H72K

    Cloudera partners with Snowflake to unleash hybrid data management integration powered by Iceberg

    Cloudera partners with Snowflake to unleash hybrid data management integration powered by Iceberg

    cloudera.com

  • 查看Apache Iceberg的公司主页,图片

    19,072 位关注者

    [repost Yingjun Wu] This page isn't affiliated with the Apache Iceberg project and doesn’t represent PMC opinions. For official news, please check the communication channels provided by the project: https://lnkd.in/dQ76H72K

    查看Yingjun Wu的档案,图片

    Founder @ RisingWave. Event-driven architecture, stream processing, databases.

    Everyone’s talking about Apache Iceberg lately. Why is it so important? It's not just about performance or cost – though those matter – it's because Iceberg is ????????! ?? Imagine you store all your data with a data warehouse vendor using their own proprietary format. At first, it feels great – fast SQL queries, everything in one place, super convenient ?? ! But then reality sets in. Maybe you want to run an AI project in Python or share some data with colleagues building apps. Suddenly, you hit a wall ?? . The vendor’s Python support isn’t solid, or there's no easy way to export your data. To top it off, your bill is climbing every year ?? – classic ???????????? ????????-???? ?? ! Vendor lock-in hurts both customers and innovative, fast-growing vendors. Proprietary formats trap users ?? , preventing access to better, more affordable technology. I see Iceberg – or other open table formats – taking over data storage quickly, and ???? ?????? ???????? 12-24 ????????????, we’ll see a big shift. The "modern data stack" won’t be centered around data warehouses anymore. Instead, it’ll be built on ???????? ?????????? ??????????????, where companies can ingest data from anywhere and query it using any engine or language. This means doing BI and AI in one place, using the best tools for the job. As for basic CRUD operations? ???????????????????? ???????????????? should be the standard – it’s the real open option. That’s how I see the future. The biggest hurdle? Multi-year contracts with big vendors. But things are moving fast. It took about 10 years for everyone to adopt the cloud, and I bet it won’t take long for the world to move to open data formats! ???? ??

    • 该图片无替代文字
  • 查看Apache Iceberg的公司主页,图片

    19,072 位关注者

    [repost Ameena Ansari] This page isn't affiliated with the Apache Iceberg project and doesn’t represent PMC opinions. For official news, please check the communication channels provided by the project: https://lnkd.in/dQ76H72K

    查看Ameena Ansari的档案,图片

    Senior Data Engineer | LinkedIn [in]structor | Hiking

    ?????????????????? ?????????????????? ???? Apache Iceberg ?? ?????? ?????? ???????? ?????????????????? ???? ????????????????????? ?????????????????? ?????????????????? are a core feature of Apache Iceberg, representing the state of a dataset at a specific point in time. Every time you perform a write, update, or delete operation in Iceberg, a new version of the metadata file is created. For example, the s0 file is generated when a table is first created, containing only the schema details (with no data yet) as shown in picture below ?? . As records are inserted, Iceberg creates new metadata files s1 , s2, updating the catalog with the latest metadata file's version. ???????? ???? ?????????????????? ?????????????????? ?????????? ???? ?????? (????????) ?????????? ? ?? ? ACID compliance ? Concurrency control ? Time travel ? Rollback capabilities ?????? ???????? ???? ????????? Each operation (insert, update, delete) creates a new snapshot that records metadata and data changes. Iceberg organizes these snapshots in a directed acyclic graph (DAG), which allows efficient tracking of data evolution and optimizes query performance. In summary, immutable snapshots in Apache Iceberg mean that once a snapshot is created, it remains unchanged, ensuring consistent, atomic access to data and enabling advanced features like time travel, concurrency, and rollback. #DataEngineering Arockia Nirmal Amala Doss Zach Wilson

    • 该图片无替代文字

相似主页