How to call a Databricks Notebook from Azure Data Factory
Md. Samiul Islam
Hadoop | Hive | Informatica | SSIS | ADF | Azure Synapse Analytics | Data Bricks | Data Streaming | Apache Spark | Python & Pandas | API Integration | Power BI | Oracle | SQL Server | Exasol | NoSQL
Databricks updated the policy for Unity Catalog to make it the default governance solution across all catalogs and workspaces.
This change is part of a broader initiative to streamline and standardize data governance across the Databricks platform, especially as organizations increasingly adopt multi-cloud strategies.
Unity Catalog now provides a centralized, cross-workspace solution that simplifies metadata management, data lineage, and security enforcement.
Previously, Databricks workspaces had individual Hive metastores, requiring manual synchronization. With Unity Catalog, these are unified, and policies are applied at an account level, allowing for consistent governance across all data assets, whether in tables, files, or machine learning models.
This includes access control features such as fine-grained row and column-level security, dynamic views, and automated data lineage for compliance and auditing.
Now come to the main point of my discussion. If you create a new workspace in Databricks then by default a new catalog will be created with the same name of workspace. Like you have created a workspace called "dbw-lakehouse" then a new catalog will be deployed automatically in your panel with the same name "dbw-lakehouse".
But this feature was not available previously. So, to call the Databricks Notebook from Azure Data Factory was to direct process.
领英推荐
After changing the policy of unity catalog, if you want to call the Databricks Notebook from Azure Data Factory and your notebook will process the data of external location like ADLS or S3 then it's important to maintain the permission of external location who can read, write or manage the storage location data system.
To handle this type of permission, we can create the access principal for user like DEV user or UAT user or PROD user and finally assign them into that group as usual.
The above image is mentioning the level of permission of a principal (means user). If we want to call notebook from ADF, then we need this managed identity ID. We can take this ID from Azure Data Factory. See the below image.
This Managed Identity Application ID will be set into the External Location which is under the Catalog Explorer and we can set up the required permission based on our need.
#Databricks #DataFactory #ETL