Sync Existing Apache Iceberg Tables with AWS Glue Data Catalog: Run It Locally, on Airflow, or EMR with a Simple YAML-based Template
If you have existing Iceberg tables and need to sync them with the AWS Glue Data Catalog, the iceberg-glue-syncPython package is your solution! This tool allows you to seamlessly register one or many Iceberg tables with the Glue Hive Metastore, making your data discoverable and queryable through AWS services.
Why Use iceberg-glue-sync?
Video guides
Steps to Sync Your Tables
Create a YAML Configuration File:If you have existing tables, use the following template to define them along with AWS configurations:
Run the Sync Command:Execute the sync process by providing the YAML configuration file:
Output
Repo
Key Use Cases
With iceberg-glue-sync, keeping your existing Iceberg tables synced with AWS Glue is hassle-free. Simplify your workflows and make your data ready for AWS analytics today!
Note:
I will be adding more sync functionality to support multiple catalogs in the future. Feel free to fork the repository and contribute! ??
While you can use AWS Glue crawlers for this process, my template offers the flexibility to add functionality and customize it based on your specific use cases and needs.
#AWS #ApacheIceberg #Glue #DataSync #DataEngineering #CloudComputing