登录查看更多内容

Slash Your Data Warehouse Costs with Apache Iceberg

Upsolver (acquired by Qlik)

Bridges the gap between engineering and data teams by streaming and optimizing operational data in an Iceberg lakehouse.

发布日期: 2024年5月3日

Dremio , the unified self-service analytics platform, surveyed 500 data leaders and practitioners from large enterprises to quiz their knowledge and adoption of data lakehouses.

The results were published in their State of the Data Lakehouse 2024 Survey, and we were lucky to have Dremio’s Developer Advocate, Alex Merced, present the findings at our Chill Data Summit in NYC, earlier this year.?

In case you missed it, here are the incredible highlights from the survey, most notably around how much businesses expect to save on their data warehousing costs by moving to a data lakehouse architecture in the next three years.

A lakehouse combines the benefits of a data warehouse with the scalability and flexibility of a data lake. With data lakes lacking the transactional consistency of a warehouse, and data warehouses costly to scale, the lakehouse is the hybrid solution that addresses the limitations of each architecture. For a deep dive into the differences between data lakes and lakehouses, including architecture, limitations, and capabilities, check out Lakehouse vs. Data Lake: The Ultimate Guide.?

Lakehouse Awareness & Adoption

The survey kicks off with investigating the awareness that respondents had around data lakehouses, and revealed that 85% were “very familiar” with the concept.

The term “data lakehouse” is far more common than a couple of years ago, leading to an upturn in adoption. When asked the percentage of analytics that organizations predict will be running in a data lakehouse in the next three years, the answer was 69%.??

So why is this number so high? More than simply jumping on the bandwagon of the latest trend, respondents believe that adopting a lakehouse architecture will reduce, or have already experienced, cost savings of over 50%. Furthermore, nearly a third of respondents from organizations with over 10K employees, expect savings to be over 75%.?

Digging into the source of this data that is moved to the lakehouse, results showed that 42% is coming from cloud data warehouses, 35% from enterprise data warehouses, and 22% from the data lake. The savings are achieved by reducing data replication, egress, and compute costs.?

While the last decade saw a drive to push data into cloud data warehouses, which promised scalability and flexibility previously unobtainable for on-prem implementations, the high costs have outweighed the benefits for many organizations.

Open Table Format

When asked if they had already adopted an open table format such as Apache Iceberg, 56% of respondents said yes, 25.7% said no but that they planned to adopt in the next year, 9.6% planned to adopt within three years, and only 9.2% had no plans to adopt an open table format.

When it comes to Apache Iceberg adoption, the future's looking good. Comparing the current open table format adoption, versus adoption in the next three years, we can see that Iceberg is leading the way:

Apache Iceberg is predicted to be the lead open table format in the next three years.

Everyone’s Talking About Apache Iceberg

Iceberg's open developer community, with names including Dremio, 苹果 , Netflix , LinkedIn , and Amazon Web Services (AWS) , is helping to drive adoption, whereas Delta Lake is fueled in the main by Databricks.

Many companies, Upsolver included, are building or supporting Iceberg in their products, helping to increase uptake and widening the scope for customers to experience the benefits. It’s easy to see that, with the support from these tech giants, Apache Iceberg is the popular choice.

What About Data Mesh?

Data mesh is a new trend that partners perfectly with data lakehouse technology. A major benefit of the open table format that the lakehouse enables, is that different tools can work on the same data. In turn, this facilitates segmenting the data into sets so that different teams within the business can build and deliver data products that make scalability quicker and easier.

When questioned, 84% of the respondents answered that they had fully or partially implemented a data mesh, with 97% expecting this to expand next year.

领英推荐

Snowflake vs. Databricks: Unraveling the Ideal Data…

FindErnest 10 个月前

Databricks vs Snowflake

Macrometa 2 年前

Modern Data Quality with Netezza: A Game-Changer for…

digna 4 个月前

What is clear from this survey is that open tables formats and Apache Iceberg are undergoing adoption as organizations realize the ease-of-use, performance, and cost-saving benefits of open data.

Iceberg was designed for high-scale analytics, and by moving data from existing warehouses, it has the potential to drastically reduce costs without sacrificing performance.

Just as an existing relational database or data warehouse requires ongoing maintenance and performance tuning, this is also the case for Apache Iceberg. Read on to discover how Upsolver automatically takes care of this.

Upsolver’s Support for Apache Iceberg

Upsolver is fully committed to supporting Iceberg lakehouses with three core solutions to ensure customers experience performant queries and reduced costs.?

Our solutions are designed to be easy to use with a minimal learning curve while doing the heavy lifting and ongoing maintenance for you, so you can concentrate on more important business activities.

Ingestion to Apache Iceberg Tables

When you ingest data to Iceberg with Upsolver, we automatically manage your tables by running background compaction processes based on industry best practices.

These operations run at the optimal time to deliver the best results and, by decreasing the size of your tables and number of files, you save money on storage and experience faster data scans.

Ingest your data from streaming, CDC, or file sources, into Apache Iceberg tables.

Iceberg Table Optimization Tool

If you have existing tables in Iceberg that we don’t manage, you can use our standalone optimization tool to analyze your tables and discover where compaction and tuning are required.

All that’s needed is a connection to your AWS Glue Data Catalog or Tabular catalog, and you can begin optimizing your tables in a matter of minutes.

Mark the tables you want Upsolver to manage, and we continuously compact files to reduce the size of your tables, thereby lowering storage costs and quickening data scans for better query performance.

Upsolver analyzes your Iceberg tables to uncover where storage and performance improvements can be made.

Free Iceberg Table Analyzer CLI

Our free-to-use, open-source analyzer quickly discovers tables in your lakehouse that need compaction to decrease storage and boost data scans. Simply install the analyzer tool and run it against your existing lakehouse to reveal tables that would benefit from optimization. The analyzer generates a report to show you potential savings on each table:?

Use our free Iceberg table analyzer to discover tables requiring optimization.

Our documentation will have you up and running with the above tools and your Iceberg tables compacted and optimized in no time. Find out more.

Watch Alex’s presentation, Reflecting on the State of the Data Lakehouse Survey, recorded live at the Chill Data Summit in NYC.

Slash Your Data Warehouse Costs with Apache Iceberg