登录查看更多内容

点击“继续加入或登录”，即表示您同意遵守领英的《用户协议》、《隐私政策》及《Cookie 政策》。

Can You Really Query Fresh Data in Apache Iceberg Tables from Snowflake?

Upsolver (acquired by Qlik)

Bridges the gap between engineering and data teams by streaming and optimizing operational data in an Iceberg lakehouse.

发布日期: 2024年6月7日

+ 关注

Thankfully the answer is yes, but first, let’s explore the reason behind this question.

The ability to query and analyze externally managed Apache Iceberg tables from Snowflake offers immense potential for users needing to integrate disparate datasets.?

This sounds great in theory, but the promise of accessing this data comes with a big challenge in maintaining data freshness for users needing timely insights.

To query fresh data in externally managed Iceberg tables from Snowflake, you need to manually refresh the table.

Have you tried this? It’s a pain, right?

The reality is ongoing maintenance, which is neither scalable nor sustainable, especially with the ever-growing volumes of data we manage.?

When inserts, updates, or deletes are applied to Iceberg tables, the metadata changes and a new snapshot is committed – a process that necessitates a manual refresh within Snowflake to ensure users have access to the most up-to-date information.

To refresh the data, you’ll need to write and orchestrate tasks to update the Snowflake table while keeping track of the snapshot filename and location - not an easy undertaking in large-scale environments with numerous data sources.

With data changes occurring randomly, determining the best time to run a refresh operation becomes a guessing game. On top of that, running refresh operations when no data changes have occurred results in unnecessary compute costs.

So what's the answer?

As the go-to solution for ingesting high-scale data into the lakehouse, Upsolver's support for Apache Iceberg makes us uniquely positioned to extend our table management services and solve this problem.

Our new table mirror feature enables you to mirror your Upsolver-managed tables to Snowflake and we take care of the table refresh for you.

This means:

Users will have access to the latest data, ensuring they can work with the most current information without the need for manual refreshes, enabling timely and accurate decision-making.
No data engineering overhead is needed to write and orchestrate tasks – our solution handles the entire process automatically.
Refreshes are executed only when data changes occur, optimizing compute resources and minimizing unnecessary costs.

There’s more good news: it’s easy to set up.

How to mirror Iceberg tables to Snowflake

To get started mirroring your Upsolver-managed Iceberg tables to Snowflake, simply follow these steps:

1. Start by creating an account with Upsolver if you don’t already have one, and follow the instructions to integrate your AWS account with Upsolver. Existing customers can log in.

2. In Upsolver, create a connection to your Snowflake database - all that's needed is a connection string, username, and password.

CREATE SNOWFLAKE CONNECTION my_snowflake_connection
        CONNECTION_STRING = 'jdbc:snowflake://snowflakedemo.us-east
                          1.snowflakecomputing.com?db=DEMO_DB'
        USER_NAME = 'my_username'
        PASSWORD = 'my_password';

3. From your Snowflake database, create a catalog integration to connect with Iceberg tables in your object store.

CREATE CATALOG INTEGRATION myCatalogInt
       CATALOG_SOURCE = OBJECT_STORE
       TABLE_FORMAT = ICEBERG
       ENABLED = TRUE;

4. Still in your Snowflake database, configure an external volume to point to the Amazon S3 storage location you shared with Upsolver.

CREATE EXTERNAL VOLUME iceberg_volume
 STORAGE_LOCATIONS =
   (
     (
       NAME = 'iceberg_location'
       STORAGE_PROVIDER = 'S3'
       STORAGE_BASE_URL = 's3://testbucket/testpath/'
       STORAGE_AWS_ROLE_ARN = 'arn:aws:iam::111111111111:role/<upsolver-role-*>'
       STORAGE_AWS_EXTERNAL_ID = 'my_external_id'
     )  
   );

5. Back in Upsolver, create an Upsolver-managed Iceberg table that will be mirrored to Snowflake.

CREATE ICEBERG TABLE default_glue_catalog.my_database.my_iceberg_table
        EXTERNAL_VOLUME = 'iceberg_volume'
        CATALOG = 'myCatalogInt'
        METADATA_FILE_PATH = 'path/to/metadata/v1.metadata.json';

6. Mirror the table to Snowflake that Upsolver will automatically refresh:

CREATE MIRROR FOR default_glue_catalog.demo.my_iceberg_table
       IN my_snowflake_connection.mirror_demo_schema.mirror_iceberg_tbl
       CATALOG_INTEGRATION = myCatalogInt
       EXTERNAL_VOLUME = iceberg_volume;

7. Create a job to ingest your data. We support streaming, database, and file sources for the major data platforms.

CREATE SYNC JOB ingest_kinesis_data
       START_FROM = BEGINNING
       CONTENT_TYPE = JSON
AS COPY FROM KINESIS my_kinesis_connection
       STREAM = 'sample-stream'
INTO default_glue_catalog.demo.my_iceberg_table;

That's it!

Your ingestion job will run continuously, loading data into your Upsolver-managed Iceberg table.

As well as mirroring your data to Snowflake to ensure users have the freshest updates, we automatically tune and compact the files for your Upsolver-managed Iceberg tables to give you the best query performance.?

Circling back to the original question, with the help of our new mirror table feature, yes, you can query fresh data from Iceberg tables.

Give it a try

For a step-by-step walkthrough for mirroring your Upsolver-managed Iceberg tables to Snowflake, please check out our how-to guide.

Want to see Upsolver in action? Schedule your free, no-obligation demo.?

Adrian Parker

Leader of a team of experts, providing #AI enabled #SmarterBI; including Data Strategy, Qlik Managed Services, and Transformative Composable Solutions with Cyferd, to clients worldwide.

1 个月

This is going to be a great asset to Qlik's Snowflake customers. And importantly where #Iceberg files stored in #S3 can presumably be used for multiple workloads as well as feeding #Snowflake?

要查看或添加评论，请登录

Upsolver (acquired by Qlik)的更多文章

See all articles

Thankfully the answer is yes, but first, let’s explore the reason behind this question.

So what's the answer?

How to mirror Iceberg tables to Snowflake

Give it a try

Upsolver (acquired by Qlik)的更多文章

Apache Iceberg and the Battle for Open Data Control

Slash Your Data Warehouse Costs with Apache Iceberg

Visualizing the Data Journey with Lineage

High Scale Ingestion Meets Big Data Analytics

Advanced Techniques for Optimizing Apache Iceberg Lakehouse Performance

10 Things You Need to Know About Apache Iceberg

Unlocking the Potential of Apache Iceberg

Meet the Women of Upsolver

On the Sofa with Upsolver's Founders

社区洞察