How to call a Databricks Notebook from Azure Data Factory
Databricks Notebook calling by Azure Data Factory

How to call a Databricks Notebook from Azure Data Factory

Databricks updated the policy for Unity Catalog to make it the default governance solution across all catalogs and workspaces.

This change is part of a broader initiative to streamline and standardize data governance across the Databricks platform, especially as organizations increasingly adopt multi-cloud strategies.

Unity Catalog now provides a centralized, cross-workspace solution that simplifies metadata management, data lineage, and security enforcement.

Previously, Databricks workspaces had individual Hive metastores, requiring manual synchronization. With Unity Catalog, these are unified, and policies are applied at an account level, allowing for consistent governance across all data assets, whether in tables, files, or machine learning models.

This includes access control features such as fine-grained row and column-level security, dynamic views, and automated data lineage for compliance and auditing.

Now come to the main point of my discussion. If you create a new workspace in Databricks then by default a new catalog will be created with the same name of workspace. Like you have created a workspace called "dbw-lakehouse" then a new catalog will be deployed automatically in your panel with the same name "dbw-lakehouse".

But this feature was not available previously. So, to call the Databricks Notebook from Azure Data Factory was to direct process.


Default Unity Catalog Deployed for Data Governance

After changing the policy of unity catalog, if you want to call the Databricks Notebook from Azure Data Factory and your notebook will process the data of external location like ADLS or S3 then it's important to maintain the permission of external location who can read, write or manage the storage location data system.

To handle this type of permission, we can create the access principal for user like DEV user or UAT user or PROD user and finally assign them into that group as usual.

User permission in Databricks

The above image is mentioning the level of permission of a principal (means user). If we want to call notebook from ADF, then we need this managed identity ID. We can take this ID from Azure Data Factory. See the below image.


Managed Identity Application ID from ADF Panel

This Managed Identity Application ID will be set into the External Location which is under the Catalog Explorer and we can set up the required permission based on our need.


#Databricks #DataFactory #ETL

要查看或添加评论,请登录

社区洞察

其他会员也浏览了