Resources for Learning more about Catalog level versioning with Project Nessie & Dremio Arctic (Rollbacks, Branching, Tagging and Multi-Table Txns)
Alex Merced
Co-Author of “Apache Iceberg: The Definitive Guide” | Senior Tech Evangelist at Dremio | LinkedIn Learning Instructor | Tech Content Creator
Data Quality, Governance, Observability, and Disaster Recovery are issues that are still trying to discover best practices in the world of the data lakehouse. A new trend is rising, borrowing from the practices used by software developers to manage these issues with code bases. This trend is called "Data as Code". Many of the practices this trend is trying to bring to the Lakehouse include:
Project Nessie
Several solutions are arising in approaching this problem from different layers, such as the catalog, table, and file levels. Project Nessie is an open-source project that solves these problems from the catalog level. Benefits of Nessie's particular approach:
Project Nessie Resources
Tutorials:
Dremio Arctic
While you can deploy your own Nessie server, you can have a cloud-managed one with some extra features using the Dremio Arctic service. Beyond the amazing catalog-level versioning features that you get with having a Nessie catalog for your tables, Dremio Arctic also provides:
领英推荐
Dremio Arctic Resources
CI/CD
Essentially you can create automated pipelines that take advantage of Nessies branching using any tool that supports Nessie for example:
These mechanisms can be used to send instructions to Nessie supporting tools like Dremio and Apache Spark. For example:
The jobs would follow a similar pattern too: