Apache Iceberg

软件开发

关注

查看全部 2 位员工

关于我们

Apache Iceberg is a cloud-native and open table format to building Open Data Lakehouses. This page isn't affiliated with the Apache Iceberg project and doesn’t represent PMC opinions. For official news, please check the communication channels provided by the project: https://iceberg.apache.org/community/

网站: https://iceberg.apache.org/
Apache Iceberg的外部链接
所属行业: 软件开发
规模: 1 人
总部: California
类型: 非营利机构

地点

主要

US，California

获取路线

Apache Iceberg员工

查看全部员工

动态

Apache Iceberg

19,072 位关注者
1 天前
举报此动态
[repost Starburst] This page isn't affiliated with the Apache Iceberg project and doesn’t represent PMC opinions. For official news, please check the communication channels provided by the project: https://lnkd.in/dQ76H72K

Starburst

39,740 位关注者
1 天前

Apache Iceberg and Trino are revolutionizing enterprise data lake architectures by tackling cost, scalability, and interoperability challenges. ?? Key Highlights from SiliconANGLE: - Iceberg delivers database-like functionality on object stores for seamless, cross-platform analytics. - Trino, co-created by Starburst’s Dain Sundstrom, powers fast, distributed query performance. - Together, they fuel innovation in modern data stacks like Starburst’s Icehouse, a fully managed Iceberg and Trino data lake. ?? Explore how Starburst is advancing modern data strategies: https://okt.to/JVh6tB #Trino #Iceberg #DataInnovation #OpenSource

Unlock efficient data processing with Iceberg - SiliconANGLE

siliconangle.com

赞评论分享
Apache Iceberg

19,072 位关注者
5 天前已编辑
举报此动态
Great introduction to cloud-based Lakehouses. Repost Olena Yarychevska This page isn't affiliated with the Apache Iceberg project and doesn’t represent PMC opinions. For official news, please check the communication channels provided by the project: https://lnkd.in/dQ76H72K

Olena Yarychevska
5 天前

?? Apache Iceberg Tables in Snowflake! ?? In this brief presentation, I’ve compiled key information about the Apache Iceberg functionality, including main features, setup, and integrations. You’ll find answers to questions like: - What is Apache Iceberg? and its role in data management - Key features, such as ACID transactions, schema evolution, and time-based snapshots - Integrating and setting up Iceberg in Snowflake - Use cases and limitations of Iceberg If you're interested in learning more about this tool or its capabilities - there is cool Tutorial: Create your first Apache Iceberg table ?? https://lnkd.in/dStt-cnJ #ApacheIceberg #Snowflake #DataLake #DataEngineering #CloudStorage #Tech #DataManagement

2 条评论

赞评论分享
Apache Iceberg转发了

Yingjun Wu

Founder @ RisingWave. Event-driven architecture, stream processing, databases.
1 周
举报此动态
Apache Iceberg is our consensus.
RisingWave

9,525 位关注者
1 周

?? GET READY for Iceberg Streaming Analytics Meetup on November 21st?? Join RisingWave, MotherDuck and Confluent for an evening packed with the latest on streaming analytics, Apache Iceberg and modern data architectures ?? ??Agenda: 5:30 PM: Doors open! Grab some food, drink, and mingle 6:00 PM: Kafka Meets Iceberg – Real-time streaming into modern data lakes by Kasun Indrasiri ? 6:30 PM: Iceberg + Postgres Protocol – RisingWave’s Iceberg power-up by Yingjun Wu ?? 7:00 PM: When All You Have is a Hammer: Using SQL in Your Data Lake by Jacob Matson ?? 7:30 PM: Network and hang out with fellow data nerds! ?? Location: MotherDuck HQ, 2811 Fairview Ave E, Suite 2000, Seattle ???Don’t miss out! RSVP now ???https://lu.ma/1xokjeia #DataStreaming #StreamProcesing #RealTimeAnalytics #SQL #Kafka
2 条评论

赞评论分享
Apache Iceberg

19,072 位关注者
1 周
举报此动态
dbt Labs ?? #ApacheIceberg This page isn't affiliated with the Apache Iceberg project and doesn’t represent PMC opinions. For official news, please check the communication channels provided by the project: https://lnkd.in/dQ76H72K
dbt Labs

98,700 位关注者
2 周

You might be in a data desert if you haven't heard the excitement around Apache Iceberg recently. Apache Iceberg is increasingly becoming the standard for open table formats tailored for large-scale datasets and interoperability between data platforms. But setting up Iceberg isn't straightforward, especially when managing metadata and creating pipelines. That's where dbt comes in. Amy Chen dives deeper into the details in their latest blog post (link in comments), but here’s the snapshot: ???With dbt's new support for Iceberg, you can now seamlessly convert your dbt models to Iceberg. Once you've configured your external storage and platform connections, updating your dbt model configuration is a breeze. ???Supported adapters include Databricks, Snowflake, Apache Spark, Starburst/Trino Software Foundation, and Dremio. Looking ahead, dbt will make it even simpler to adopt Iceberg with features like better access to external catalogs, volume management, data refresh handling, and permission enforcement. Plus, the framework is designed to be flexible—so whether you’re using Iceberg or another table format, dbt has you covered. The ultimate goal? To help data engineers focus on data, not join the storage format wars.
赞评论分享
Apache Iceberg转发了

WarpStream

2,486 位关注者
4 个月已编辑
举报此动态
Kafka: you either love or hate it, but one thing is true — most companies use it. Given Kafka tends to proliferate in terms of cost and operational complexity, how can it be made simpler and cheaper?

?? Kafka Is Dead, Long Live Kafka ??

warpstream.com

4 条评论

赞评论分享
Apache Iceberg

19,072 位关注者
2 周
举报此动态
[repost Stéphane Heckel] This page isn't affiliated with the Apache Iceberg project and doesn’t represent PMC opinions. For official news, please check the communication channels provided by the project: https://lnkd.in/dQ76H72K
Stéphane Heckel
3 周

The Future of Data is Composable, Portable and Programmable. These interconnected principles are reshaping how organizations handle and leverage their data assets. ???Composability allows organizations to build flexible data infrastructures using modular, interchangeable components. This approach centers on a multi-engine architecture, enabling the integration of core processing engines like Snowflake or Databricks augmented with various use-case-based engines like DuckDB, Trino, Dremio within a complex ecosystem. At the core of this architecture is Apache Iceberg, an open table format serving as a common storage layer. Iceberg provides a consistent foundation accessible by different engines, reducing data duplication and simplifying management processes. Key advantages of composability include: ●?Separation of storage and compute for efficient resource allocation ●?Cost reduction by eliminating data movement between systems ●?Enhanced flexibility in data usage, opening up new possibilities ???Portability focuses on where data resides, embracing a hybrid approach that combines cloud computing benefits with data sovereignty needs. This strategy helps mitigate risks such as cloud dependency, unexpected price increases and vendor lock-in. To implement a portable data strategy, organizations can: ●?Adopt a multi-cloud or hybrid approach ●?Consider open-source alternatives as part of the technology stack ●?Negotiate flexible contracts with cloud providers ?? Programmability extends the principles of composability and portability into automation and code-driven management. This "Everything as Code" approach is now applied to data platforms, orchestration, visualization, and other aspects of the data ecosystem. Key concepts include: ●?Infrastructure as Code: Enables automation of infrastructure creation ●?Platform as Code: Allows management of entire data platforms ●?Orchestration / Automation as Code: Run your pipelines ●?Dataviz as Code: Enables consistent and automated data visualization The future Composable Data Platform should embody all these principles: ●?Modular architecture for mix-and-match of components ●?Flexibility to run on-premises, in the cloud or in hybrid environments ●?Fully programmable setup, from infrastructure to data processing and visualization Interestingly, throughout this exploration of concepts, I have managed to paint the big picture without once resorting to the overused buzzword "Modern." ;-)
1 条评论

赞评论分享
Apache Iceberg

19,072 位关注者
3 周
举报此动态
[repost Dipankar Mazumdar, M.Sc ??] This page isn't affiliated with the Apache Iceberg project and doesn’t represent PMC opinions. For official news, please check the communication channels provided by the project: https://lnkd.in/dQ76H72K
Dipankar Mazumdar, M.Sc ??

Staff Data Engineer Advocate @Onehouse.ai | Apache Hudi, Iceberg Contributor | Author of "Engineering Lakehouses"
3 周

Announcing my new book - "Engineering Lakehouses with Open Table Formats" ?? TBH, I have been thinking about this for quite some time. A lot of times, in conversations with folks exploring table formats, questions have come up around choosing the right table formats, understanding use cases, and designing the overall lakehouse architecture. So, the goal with this book is to provide a comprehensive resource for data/software engineers, architects, and decision-makers to understand the essentials of these formats. But, also to elaborate on some of the less talked about 'core' stuff (beyond marketing jargons). Specifically, the book will target 4 angles: ?? Table format Internals - e.g. How ACID transactions works, What is a Storage Engine, Performance optimization methods, etc. ?? Decisions on selecting a table format - factors to consider from a technical standpoint, ecosystem, features. ?? Use-cases and how to implement - streaming/batch, single-node workloads, CDC, integration with MLFlow, etc. ?? What's happening next - Interoperability (Apache XTable (Incubating), UniForm), Catalogs (Hive to newer ones such as Unity Catalog, Apache Polaris (Incubating)) I’ve been fortunate to have first-hand experience working with open table formats like Apache Iceberg and Apache Hudi primarily, and in some capacity with Delta Lake (circa 2019). And, I intent to bring those experiences and touch upon the intricacies along with some of the pain points of getting started. I am also thrilled to have Vinoth Govindarajan as a co-author, who brings a wealth of experience building lakehouses at scale with these formats at organizations like Uber and Apple. We have drafted the first few chapters, but there's still work to do. We’d love to take this opportunity to learn more from the community about any additional topics of interest for the book. I'll be opening a formal feedback channel in a few days. Oh, and the book is already available for pre-order on Amazon (link in comments). Thanks to Packt for their continuous support in making this a solid effort! #dataengineering #softwareengineering
3 条评论

赞评论分享
Apache Iceberg

19,072 位关注者
3 周已编辑
举报此动态
Cloudera and Snowflake join forces to bring enterprises an open, unified hybrid data Lakehouse, powered by #Iceberg. More here: https://bit.ly/4848s8V This page isn't affiliated with the Apache Iceberg project and doesn’t represent PMC opinions. For official news, please check the communication channels provided by the project: https://lnkd.in/dQ76H72K

Cloudera partners with Snowflake to unleash hybrid data management integration powered by Iceberg

cloudera.com

1 条评论

赞评论分享
Apache Iceberg

19,072 位关注者
3 周
举报此动态
[repost Yingjun Wu] This page isn't affiliated with the Apache Iceberg project and doesn’t represent PMC opinions. For official news, please check the communication channels provided by the project: https://lnkd.in/dQ76H72K
Yingjun Wu

Founder @ RisingWave. Event-driven architecture, stream processing, databases.
1 个月

Everyone’s talking about Apache Iceberg lately. Why is it so important? It's not just about performance or cost – though those matter – it's because Iceberg is ????????! ?? Imagine you store all your data with a data warehouse vendor using their own proprietary format. At first, it feels great – fast SQL queries, everything in one place, super convenient ?? ! But then reality sets in. Maybe you want to run an AI project in Python or share some data with colleagues building apps. Suddenly, you hit a wall ?? . The vendor’s Python support isn’t solid, or there's no easy way to export your data. To top it off, your bill is climbing every year ?? – classic ???????????? ????????-???? ?? ! Vendor lock-in hurts both customers and innovative, fast-growing vendors. Proprietary formats trap users ?? , preventing access to better, more affordable technology. I see Iceberg – or other open table formats – taking over data storage quickly, and ???? ?????? ???????? 12-24 ????????????, we’ll see a big shift. The "modern data stack" won’t be centered around data warehouses anymore. Instead, it’ll be built on ???????? ?????????? ??????????????, where companies can ingest data from anywhere and query it using any engine or language. This means doing BI and AI in one place, using the best tools for the job. As for basic CRUD operations? ???????????????????? ???????????????? should be the standard – it’s the real open option. That’s how I see the future. The biggest hurdle? Multi-year contracts with big vendors. But things are moving fast. It took about 10 years for everyone to adopt the cloud, and I bet it won’t take long for the world to move to open data formats! ???? ??
2 条评论

赞评论分享
Apache Iceberg

19,072 位关注者
1 个月
举报此动态
[repost Ameena Ansari] This page isn't affiliated with the Apache Iceberg project and doesn’t represent PMC opinions. For official news, please check the communication channels provided by the project: https://lnkd.in/dQ76H72K
Ameena Ansari

Senior Data Engineer | LinkedIn [in]structor | Hiking
1 个月

?????????????????? ?????????????????? ???? Apache Iceberg ?? ?????? ?????? ???????? ?????????????????? ???? ????????????????????? ?????????????????? ?????????????????? are a core feature of Apache Iceberg, representing the state of a dataset at a specific point in time. Every time you perform a write, update, or delete operation in Iceberg, a new version of the metadata file is created. For example, the s0 file is generated when a table is first created, containing only the schema details (with no data yet) as shown in picture below ?? . As records are inserted, Iceberg creates new metadata files s1 , s2, updating the catalog with the latest metadata file's version. ???????? ???? ?????????????????? ?????????????????? ?????????? ???? ?????? (????????) ?????????? ? ?? ? ACID compliance ? Concurrency control ? Time travel ? Rollback capabilities ?????? ???????? ???? ????????? Each operation (insert, update, delete) creates a new snapshot that records metadata and data changes. Iceberg organizes these snapshots in a directed acyclic graph (DAG), which allows efficient tracking of data evolution and optimizes query performance. In summary, immutable snapshots in Apache Iceberg mean that once a snapshot is created, it remains unchanged, ensuring consistent, atomic access to data and enabling advanced features like time travel, concurrency, and rollback. #DataEngineering Arockia Nirmal Amala Doss Zach Wilson
赞评论分享

相似主页

随时了解Apache Iceberg最新信息

Apache Iceberg

软件开发

关于我们

地点

Apache Iceberg员工

Brian Оlsen

US marine ?? developer ?? open source advocate | open standards?? | fedi ??? | adhd ?? | data + ml ?? | musician ?? | odd duck ??

Richard Taylor

--

动态

Unlock efficient data processing with Iceberg - SiliconANGLE

siliconangle.com

?? Kafka Is Dead, Long Live Kafka ??

warpstream.com

Cloudera partners with Snowflake to unleash hybrid data management integration powered by Iceberg

cloudera.com

立即加入，查看您错过的职场动态

相似主页

Delta Lake

Tabular (now part of Databricks)

Apache Hudi

DuckDB

Databricks

Apache XTable (Incubating)

dbt Labs

Apache Airflow

Snowflake

Polars