Integrating Microsoft Fabric and Databricks Unity Catalog
Andreas Rederer (geb. Essbaumer)
Technology Specialist Data & AI at Microsoft
This is a short guide on how to currently integrate Microsoft Fabric with Databricks Unity Catalog to get the best of both tools as well as an outlook how the integration might look like. Combining the tools you can get the best of both platforms with minimal effort.
Introduction
Although different platforms both Databricks and Microsoft Fabric embrace the Lakehouse concept (Databricks, Fabric) as well as using Delta-Parquet as the basis for the physical data layer.
With the ability of using OneLake-Shortcuts in Microsoft Fabric you can directly work with tables defined in the Databricks Unity Catalog as if they were tables stored in the Fabric OneLake.
Setup
To follow along you need to have a Microsoft Fabric Lakehouse. You either use an existing one or create a new one. I would suggest to have one Lakehouse for each schema in the Unity Catalog.
For the Databricks part you need to have Unity Catalog enabled for your account and a metastore. Then you can create catalogs in this metastore from within your Databricks workspace as well as schemas in those catalogs and tables in those schemas.
You should create your catalog with an external location pointing to a location in an Azure Data Lake Storage Gen2.
Let′s assume I have a catalog called "healthcare" with a schema called "clinicaltrialssilver" and some tables in it called "subject", "adverse_events", "measurements".
Create a shortcut to a Databricks Unity Catalog table
To work with those tables from Fabric you can easily create shortcuts. This way you can work with the data as if it was stored in the Fabric OneLake but it′s actually only a reference and no data is moved.
To do so go to your Fabric Lakehouse and create a table shortcut.
Choose ADLS Gen2 and build a connection. The only missing piece is to know the path where the Unity Catalog table′s delta files are actually stored.
To do so either use the API, CLI or SDK of Databricks or navigate to the Catalog/Data section within your Databricks workspace and check the details section of the corresponding table.
领英推荐
(The code for the API calls can also be found in this repo: DaSenf1860/fabricdatabricksunitycatalog (github.com))
Adjust the abfss-path to the corresponding https-path and you can complete the shortcut-setup. (I′ve done the path-conversion programmatically in my code.)
Now you can see the table in your Fabric Lakehouse.
As both the table definition in the Fabric Lakehouse as well as the Unity Catalog table are actually pointing to the same delta table in the ADLS they are exactly the same. They even share their delta-specific features like enabled change feed etc.
Outlook
To avoid doing this process one by one you can use the Databricks API, CLI or SDKs to quickly extract the table details for each table in a specific schema or even for a whole catalog and create shortcuts for all of them.
As soon as we have the full Fabric API-capabilities in that regard you can easily sync your Unity catalogs with your Fabric Lakehouses and have both metalayers constantly in sync without moving any data and without additional effort.
Considering the big benefits of harmonizing both experiences hopefully we will soon get a functionality to sync Fabric Lakehouses with Databricks schemas on the data meta layer as well as on the permissions layer automatically.
Reap the benefits
Integrating both experiences we get the best out of the two. We can have all the state-of-the-art data engineering capabilities from Databricks like Delta Live Tables, Auto Loader, Photon Engine, Liquid Clustering etc. paired with the full power of Fabric like the wide range of data integration scenarios, a Data Warehouse experience, real-time analytics, Power BI direct lake mode, Data Activator etc.
Being able to use the Unity Catalog tables seamlessly from Fabric and the other way around gives you the full flexibility to choose the best tool for each task and still have a unified experience.
Potentially we could see architectures like the following with close to zero integration effort.
Data enthusiast | Microsoft Certified Associate: (1) Fabric Analytics Engineer, (2) Azure Enterprise Data Analyst, (3) Power BI Data Analyst | Blogger | MSc - Quantitative Economics
6 个月Thanks for the article! But as I understand it doesn't work yet if the ADLS Gen 2 lies behind a firewall? The new managed private endpoints might fix the issue, but it's only available for F SKUs.
CEO @ Skypoint | AI Platform For Regulated Industries - Copilots & Agents | Healthcare | Public Sector | Financial Services
7 个月Abhilash Sharma Sayandip Sarkar Gregory Petrossian
I don't think Unity Catalog provides capabilities for data engineering tools other than Databricks to leverage it. Microsoft Purview can connect to Unity for the purposes of scanning/crawling metadata. What you've described here isn't actually an integration between Fabric and Unity Catalog. All you've done is created a connection between Fabric and the underlying stored data in ADLS. The difference is not trivial and has significant impact to a multitude of architectural pillars including security and access control, privacy and risk management, metadata and lineage tracking, compute management and even cost management to name a few.
Data Engineering Manager | Data Architecture| DataOps | MLOps
9 个月Does the tables in Unity Catalog have to be External or they can be managed tables as well in order to access them as a OneLake shortcut?
Senior Architect Business Analytics
11 个月Thanks for an interesting guide?Andreas! In regards of having "both metalayers constantly in sync", are there any known plans in this regard?