Apache Iceberg的封面图片

关于我们

Apache Iceberg is a cloud-native and open table format to building Open Data Lakehouses. This page isn't affiliated with the Apache Iceberg project and doesn’t represent PMC opinions. For official news, please check the communication channels provided by the project: https://iceberg.apache.org/community/

网站
https://iceberg.apache.org/
所属行业
软件开发
规模
1 人
总部
California
类型
非营利机构

地点

Apache Iceberg员工

动态

  • 查看Apache Iceberg的组织主页

    22,855 位关注者

    查看Data Council的组织主页

    6,488 位关注者

    ?? SPEAKER SPOTLIGHT: We're thrilled to welcome Ryan Blue, Creator of Apache Iceberg and Technical Staff at Databricks as he discusses "Why is Everyone Talking about Apache Iceberg??" on our Data Engineering & Infrastructure track. Learn about how Ryan's work on open table formats has transformed the analytics industry and why this universal format is more relevant than ever for modern data architecture. -- See you at Data Council 2025 in Oakland, April 22-24. Buy your tickets today! ??? datacouncil.ai

    • 该图片无替代文字
  • 查看Apache Iceberg的组织主页

    22,855 位关注者

    This page isn't affiliated with the Apache Iceberg project and doesn’t represent PMC opinions. For official news, please check the communication channels provided by the project: https://lnkd.in/dQ76H72K

    查看Dipankar Mazumdar, M.Sc的档案

    Staff Data Engineer Advocate @Onehouse.ai | Apache Hudi, Iceberg Contributor | Author of "Engineering Lakehouses"

    5 Performance Optimization Techniques in a Lakehouse. Here are?5 key strategies?for optimizing performance in table formats like Apache Hudi (most of these apply to formats like Apache Iceberg and Delta Lake as well): ??Partitioning: Store related data together in partitions when writing to storage, enabling faster access. ??File Sizing/Compaction: Merge smaller files into larger, more efficient ones to minimize the number of files scanned during queries. ??Clustering: Reorganize your data for improved query efficiency: - Linear Sorting: Arrange data by a single column or range of values. - Multi-dimensional Clustering: Optimize data layout across multiple columns using techniques like Z-ordering or Hilbert Curve, ideal for queries filtering on multiple dimensions. ??Data Skipping: Use file format statistics (e.g., #Parquet min/max stats) and Bloom filters to skip irrelevant data during retrieval, significantly improving query speed. ??Cleaning: Remove outdated or unused data and update metadata to maintain a lean and efficient dataset, ensuring optimal performance over time. I published a detailed blog diving into each of these techniques. Check out the comments for the link! ??

  • 查看Apache Iceberg的组织主页

    22,855 位关注者

    Join Cloudera and Snowflake for an Apache Iceberg? Meetup to get hands-on with interactive demos, engage in discussions with fellow data pros to share insights, and connect with the greater Apache Iceberg? community. In-person event in Austin/TX and broadcated for remote attendees. Register now: https://lnkd.in/dxUkmPE5 This page isn't affiliated with the Apache Iceberg project and doesn’t represent PMC opinions. For official news, please check the communication channels provided by the project: https://lnkd.in/dQ76H72K

  • 查看Apache Iceberg的组织主页

    22,855 位关注者

    [French content by Charly Clairmont] Check how Debezium + Iceberg is a great combination to streamline Analytics. This page isn't affiliated with the Apache Iceberg project and doesn’t represent PMC opinions. For official news, please check the communication channels provided by the project: https://lnkd.in/dQ76H72K

    查看Charly Clairmont的档案

    CTO at Synaltic

    Le modèle d'architecture du #DataLakehouse fait son chemin. Ce modèle conjugue la convergence entre le #streaming et #batch qui transforme la manière dont les organisations peuvent disposer d'une donnée consolidée rapidement disponible avec un co?t de possession ma?trisée. Debezium prend une place prépondérante aujourd'hui. Debezium + Apache Iceberg redonnent aux organisations le contr?le de leurs données. Merci à Ismail Simsek pour Debezium Server Iceberg. Je vous partage un point de vue dans une chronique sur le Journal du Net ??

    • 该图片无替代文字
  • 查看Apache Iceberg的组织主页

    22,855 位关注者

    Iceberg Summit 2025 – Don’t Miss Out! The premier Apache Iceberg event is almost here! Whether you’re diving deep into open table formats or looking to connect with top data engineers, this is your chance to explore the latest in modern data lakes, analytics, and more. ?? In-Person Summit – April 8th, San Jose, CA ($200) ?? Virtual Summit – April 9th (FREE!) ?? Register now: https://lnkd.in/enJNDzr2 Join industry leaders, open-source experts, and fellow data enthusiasts for hands-on sessions, cutting-edge talks, and valuable networking. If you can’t make it in person, sign up for the free virtual event and catch all the insights! Secure your spot today! ?? #IcebergSummit #ApacheIceberg #DataEngineering #OpenSource #DataLakes #Analytics

    • 该图片无替代文字
  • 查看Apache Iceberg的组织主页

    22,855 位关注者

    Here's an in-depth breakdown of why businesses like Netflix, Adobe, and Bloomberg adopted Iceberg. Read this blog by Hevo Data?to know the advantages Iceberg offers across multiple facets like storage, architecture, and more. https://lnkd.in/ewdMzJTp #ApacheIceberg #Iceberg This page isn't affiliated with the Apache Iceberg project and doesn’t represent PMC opinions. For official news, please check the communication channels provided by the project: https://lnkd.in/dQ76H72K

  • 查看Apache Iceberg的组织主页

    22,855 位关注者

    Repost Rui Carvalho This page isn't affiliated with the Apache Iceberg project and doesn’t represent PMC opinions. For official news, please check the communication channels provided by the project: https://lnkd.in/dQ76H72K

    查看Rui Carvalho的档案

    Data & Analytics Engineer ?? | Azure | Databricks | Spark | SQL Server | MS Fabric | Speaker at Data Events | Medium Writer ??

    Why is Apache Iceberg gaining so much traction in data engineering? Traditional data lakes have always struggled with slow queries, lack of ACID transactions, and complex schema evolution. Apache Iceberg fixes these problems by bringing powerful data warehouse features to the flexible world of data lakes. Here are some key takeaways from Chapter 1 of Apache Iceberg: The Definitive Guide: ? ACID Transactions – Reliable and consistent updates, just like a data warehouse. ? Schema & Partition Evolution – Modify schemas and partitions without rewriting data. ? Time Travel & Rollbacks – Query historical data states and recover from mistakes. ? Optimized Query Performance – Metadata indexing eliminates expensive full-table scans. ? Multi-Engine Support – Works with Spark, Flink, Presto, Trino, and more. Apache Iceberg eliminates costly ETL pipelines, reduces infrastructure overhead, and makes data lakes truly scalable. ?? I’m currently reading Apache Iceberg: The Definitive Guide and sharing my learnings as I go. If you’re working with big data, I highly recommend checking it out! Read the full article here ?? https://lnkd.in/dXHtfCCv #dataengineering #apacheiceberg #bigdata #datalakehouse

  • 查看Apache Iceberg的组织主页

    22,855 位关注者

    Repost Kevin Petrie This page isn't affiliated with the Apache Iceberg project and doesn’t represent PMC opinions. For official news, please check the communication channels provided by the project: https://lnkd.in/dQ76H72K

    查看Kevin Petrie的档案

    Vice President of Research at BARC

    Data fabric offers real advantages for companies that need to manage distributed data for AI initiatives - which is to say, most companies. In her excellent BARC webinar "Forget About Data Warehousing – Think Data Fabric" yesterday, Jacqueline Bloemen compared the lakehouse and fabric. One key difference: the lakehouse brings data to the compute, and the fabric brings the compute to the data. In other words: > Companies consolidate physical data from various sources into the lakehouse, where they catalog and transform it for analytics. > With a fabric, they do not consolidate. Rather, they catalog, transform and access data through wherever it resides in distributed locations. https://lnkd.in/gP6yvs4t The data fabric combines metadata, semantic layer and virtualization elements to help manage and analyze data across platforms, regions and cloud/on-premises data centers. All this simplifies user access and data management in heterogeneous environments. Good stuff! My take: fabric and data virtualization will gain popularity this year because data - especially data inputs for AI/ML - remains stubbornly distributed for several reasons. > While hyperscalers and lakehouse platforms offer impressive tool suites, companies can't afford to lock into just one vendor. They want to maintain tool interoperability and data mobility. That's one reason that the Apache Iceberg open table format is so popular. It enables companies to maintain evolving datasets across multiple formats and sources - avoiding lock in. > Data has gravity. Most companies have accumulated years or decades of applications and processes that remain on premises. The mainframe market, of all things, continues to grow, thanks in no small part to incremental investments on premises. > Some companies prefer to run AI on premises because it is so strategic to them. They want to own the GPUs (if they can get them), and they want to access the full ecosystem of advanced tools without locking into one hyperscaler. > Sovereignty concerns loom large these days. Companies are hesitant to place their data on servers in countries with relationships or rules that conflict with their own. This matters given a range of geopolitical tensions in 2025. > Repatriation is on the rise thanks to sovereignty requirements, security threats and cost concerns. This means companies are moving data from the cloud back on premises. In fact, 83% of enterprise CIOs plan to repatriate at least some workloads in 2024, up from 43% four years ago, according to a recent Barclays survey. In this context, the fabric approach offers persistent value. So I definitely recommend watching Jacqueline's educational webinar to learn more about the data fabric. Harald Erb of Snowflake and Christoph Papenfuss of Agile Data Engine joined her to share their companies' perspectives on evolving approaches to data architecture. #data #ai #datafabric #lakehouse Florian Bigelmaier Timm Grosser Shawn Rogers

    • 该图片无替代文字
  • 查看Apache Iceberg的组织主页

    22,855 位关注者

    #IcebergSummit is back again in-person in San Francisco on April 8 and virtually April 9. Mark your calendars! ?? Interested in sharing your #ApacheIceberg knowledge? The Selection Committee is looking for sessions covering Iceberg and ecosystem use cases, best practices, and deep dives. Check out the #CfP page for more details: https://lnkd.in/e3aJj_ns

  • 查看Apache Iceberg的组织主页

    22,855 位关注者

    Qlik acquires Upsolver to deliver better Data Management capabilities in the Lakehouse era. Read more: https://lnkd.in/daNm2VHc This page isn't affiliated with the Apache Iceberg project and doesn’t represent PMC opinions. For official news, please check the communication channels provided by the project: https://lnkd.in/dQ76H72K

相似主页

查看职位