Marvelous MLOps #28: Getting started with Databricks asset bundles
If you have ever worked with Databricks, you have noticed that there are multiple ways to deploy a Databricks job with all its dependencies. It is very easy to get lost in it.
You may have encountered some of the following common ways to deploy a job on Databricks:
Terraform and Databricks APIs are "DIY" solutions. If you want to execute a Python script (let's call it main.py) that requires a config.yml file and custom package my_package to run, you will need to make sure all files are uploaded to the workspace or dbfs before the job is deployed.
Databricks asset bundles and dbx are solutions that manage those dependencies for you. However, dbx is not officially supported, and Databricks advises using asset bundles instead (at the moment of publication, in public preview).
In this article, we go through the local setup for DAB, a code example, and the GitHub Actions deployment pipeline.
DAB: local setup
First of all, you need to install Databricks CLI version >= 0.205. Follow the instructions from here: https://docs.databricks.com/en/dev-tools/cli/install.html. If you already had an older installation of Databricks, delete that one first.
After CLI is installed, you need to configure Databricks. The easiest way to do it is to create environment variables DATABRICKS_HOST and DATABRICKS_TOKEN.
Lead MLOps Engineer
1 年This is a great article, DAB are nice and straightforward to follow. I’m interested in hearing your thoughts on how DAB would work in private workspaces and what considerations should you have.