Marvelous MLOps #28: Getting started with Databricks asset bundles

Marvelous MLOps #28: Getting started with Databricks asset bundles

If you have ever worked with Databricks, you have noticed that there are multiple ways to deploy a Databricks job with all its dependencies. It is very easy to get lost in it.

You may have encountered some of the following common ways to deploy a job on Databricks:

  • Terraform
  • Databricks APIs
  • dbx by Databricks Labs
  • Databricks asset bundles (DAB)

Terraform and Databricks APIs are "DIY" solutions. If you want to execute a Python script (let's call it main.py) that requires a config.yml file and custom package my_package to run, you will need to make sure all files are uploaded to the workspace or dbfs before the job is deployed.

Databricks asset bundles and dbx are solutions that manage those dependencies for you. However, dbx is not officially supported, and Databricks advises using asset bundles instead (at the moment of publication, in public preview).

In this article, we go through the local setup for DAB, a code example, and the GitHub Actions deployment pipeline.

DAB: local setup

First of all, you need to install Databricks CLI version >= 0.205. Follow the instructions from here: https://docs.databricks.com/en/dev-tools/cli/install.html. If you already had an older installation of Databricks, delete that one first.

After CLI is installed, you need to configure Databricks. The easiest way to do it is to create environment variables DATABRICKS_HOST and DATABRICKS_TOKEN.

Read further on our Substack

Lewis Wong

Lead MLOps Engineer

1 年

This is a great article, DAB are nice and straightforward to follow. I’m interested in hearing your thoughts on how DAB would work in private workspaces and what considerations should you have.

回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了