Can You Really Query Fresh Data in Apache Iceberg Tables from Snowflake?
Upsolver (acquired by Qlik)
Bridges the gap between engineering and data teams by streaming and optimizing operational data in an Iceberg lakehouse.
Thankfully the answer is yes, but first, let’s explore the reason behind this question.
The ability to query and analyze externally managed Apache Iceberg tables from Snowflake offers immense potential for users needing to integrate disparate datasets.?
This sounds great in theory, but the promise of accessing this data comes with a big challenge in maintaining data freshness for users needing timely insights.
To query fresh data in externally managed Iceberg tables from Snowflake, you need to manually refresh the table.
Have you tried this? It’s a pain, right?
The reality is ongoing maintenance, which is neither scalable nor sustainable, especially with the ever-growing volumes of data we manage.?
When inserts, updates, or deletes are applied to Iceberg tables, the metadata changes and a new snapshot is committed – a process that necessitates a manual refresh within Snowflake to ensure users have access to the most up-to-date information.
To refresh the data, you’ll need to write and orchestrate tasks to update the Snowflake table while keeping track of the snapshot filename and location - not an easy undertaking in large-scale environments with numerous data sources.
With data changes occurring randomly, determining the best time to run a refresh operation becomes a guessing game. On top of that, running refresh operations when no data changes have occurred results in unnecessary compute costs.
So what's the answer?
As the go-to solution for ingesting high-scale data into the lakehouse, Upsolver's support for Apache Iceberg makes us uniquely positioned to extend our table management services and solve this problem.
Our new table mirror feature enables you to mirror your Upsolver-managed tables to Snowflake and we take care of the table refresh for you.
This means:
There’s more good news: it’s easy to set up.
How to mirror Iceberg tables to Snowflake
To get started mirroring your Upsolver-managed Iceberg tables to Snowflake, simply follow these steps:
1. Start by creating an account with Upsolver if you don’t already have one, and follow the instructions to integrate your AWS account with Upsolver. Existing customers can log in.
2. In Upsolver, create a connection to your Snowflake database - all that's needed is a connection string, username, and password.
CREATE SNOWFLAKE CONNECTION my_snowflake_connection
CONNECTION_STRING = 'jdbc:snowflake://snowflakedemo.us-east
1.snowflakecomputing.com?db=DEMO_DB'
USER_NAME = 'my_username'
PASSWORD = 'my_password';
3. From your Snowflake database, create a catalog integration to connect with Iceberg tables in your object store.
CREATE CATALOG INTEGRATION myCatalogInt
CATALOG_SOURCE = OBJECT_STORE
TABLE_FORMAT = ICEBERG
ENABLED = TRUE;
4. Still in your Snowflake database, configure an external volume to point to the Amazon S3 storage location you shared with Upsolver.
CREATE EXTERNAL VOLUME iceberg_volume
STORAGE_LOCATIONS =
(
(
NAME = 'iceberg_location'
STORAGE_PROVIDER = 'S3'
STORAGE_BASE_URL = 's3://testbucket/testpath/'
STORAGE_AWS_ROLE_ARN = 'arn:aws:iam::111111111111:role/<upsolver-role-*>'
STORAGE_AWS_EXTERNAL_ID = 'my_external_id'
)
);
5. Back in Upsolver, create an Upsolver-managed Iceberg table that will be mirrored to Snowflake.
CREATE ICEBERG TABLE default_glue_catalog.my_database.my_iceberg_table
EXTERNAL_VOLUME = 'iceberg_volume'
CATALOG = 'myCatalogInt'
METADATA_FILE_PATH = 'path/to/metadata/v1.metadata.json';
6. Mirror the table to Snowflake that Upsolver will automatically refresh:
CREATE MIRROR FOR default_glue_catalog.demo.my_iceberg_table
IN my_snowflake_connection.mirror_demo_schema.mirror_iceberg_tbl
CATALOG_INTEGRATION = myCatalogInt
EXTERNAL_VOLUME = iceberg_volume;
7. Create a job to ingest your data. We support streaming, database, and file sources for the major data platforms.
CREATE SYNC JOB ingest_kinesis_data
START_FROM = BEGINNING
CONTENT_TYPE = JSON
AS COPY FROM KINESIS my_kinesis_connection
STREAM = 'sample-stream'
INTO default_glue_catalog.demo.my_iceberg_table;
That's it!
Your ingestion job will run continuously, loading data into your Upsolver-managed Iceberg table.
As well as mirroring your data to Snowflake to ensure users have the freshest updates, we automatically tune and compact the files for your Upsolver-managed Iceberg tables to give you the best query performance.?
Circling back to the original question, with the help of our new mirror table feature, yes, you can query fresh data from Iceberg tables.
Give it a try
For a step-by-step walkthrough for mirroring your Upsolver-managed Iceberg tables to Snowflake, please check out our how-to guide.
Want to see Upsolver in action? Schedule your free, no-obligation demo.?
Leader of a team of experts, providing #AI enabled #SmarterBI; including Data Strategy, Qlik Managed Services, and Transformative Composable Solutions with Cyferd, to clients worldwide.
1 个月This is going to be a great asset to Qlik's Snowflake customers. And importantly where #Iceberg files stored in #S3 can presumably be used for multiple workloads as well as feeding #Snowflake?