Test to see if you can join two managed Iceberg tables in different S3 table buckets and how you should configure the Spark session.

Test to see if you can join two managed Iceberg tables in different S3 table buckets and how you should configure the Spark session.

The Goal of This Test

The goal of this test is to verify that Apache Spark can join Iceberg tables stored in separate S3 Table buckets. By testing with the Customer table in demo-bucket1 and the Order table in demo-bucket2, we aim to ensure seamless cross-bucket joins and efficient data processing in AWS environments.

Step 1: Create Table Buckets

First, we'll create two S3 table buckets where our Iceberg tables will reside. Here's the command to create two separate buckets:

These two buckets (demo-bucket1 and demo-bucket2) will house the customers and orders Iceberg tables, respectively.

Step 2: Create Customer Table in demo-bucket1

Next, we’ll configure the Spark session and create the customers table in demo-bucket1.

Step 3: Create Orders Table in demo-bucket2

Similarly, we create the orders table in demo-bucket2:

Output


Bucket 1


Bucket 2


Step 4: Attempt to Join Tables from Different S3 Buckets

To join the customers table from demo-poc-bucket1 and the orders table from demo-poc-bucket2, we’ll configure the Spark session for multiple S3 table bucket ARNs.


Create Spark Session


Lets Try JOIN


Output


Success

Code

https://soumilshah1995.blogspot.com/2025/01/test-to-see-if-you-can-join-two-managed.html


Conclusion

In this blog, we successfully configured Apache Spark to use multiple S3 table bucket ARNs and performed a join between two tables stored in different buckets (demo-bucket1 and demo-bucket2). This test validated that Spark can seamlessly join Iceberg tables across different S3 Table buckets

Roy Hasson

Product @ Qlik | Data engineer | Advocate for better data

1 个月

Nice write up, but kind of messed up that you need to jump through all these hoops just to join two tables in the same lake and the same catalog. Did you try federating the S3 Tables with Glue catalog and then joining them using Glue catalog only?

回复

要查看或添加评论,请登录

Soumil S.的更多文章

社区洞察

其他会员也浏览了