Databricks Lakehouse with Microsoft Fabric
SantoshKumar V (He/Him)
#Opentowork |Actively looking for new Roles | Multi-Cloud (Azure, AWS, GCP) | DevOps | SRE | DataOps | AI | Cloud Native | Security| Ex- Samsung, Morgan Stanley, Goldman Sachs
Microsoft Fabric is the new Microsoft branded analytics platform that integrates data warehousing, data integration and orchestration, data engineering, data science, real-time analytics, and business intelligence into a single product. This new one-stop shop for analytics brings a set of innovative features and capabilities by providing a transformative tool for data professionals and business users. With Microsoft Fabric, you can leverage the power of data and AI to transform your organization and create new opportunities.
Why should you be eager to explore this insightful content?
Unleash the boundless potential of Microsoft Fabric, which can seamlessly integrate with your Databricks data lakehouse. In this insight you will discover the benefits of using Microsoft Fabric on top of your enterprise data, specifically when exploring the Direct Lake feature and its relevance to your modern data platform. And finally, you will uncover the easiness of integrating both worlds.
The information is intended for individuals or for companies who have invested in a Databricks-enabled data lakehouse and seek to utilize Microsoft Fabric as a reporting tool. In the image below you see a diagram of your modern data platform using Databricks as a transformation engine where only the semantic layer will be served by Microsoft Fabric instead of Power BI.
What is Power BI Direct Lake mode and V-ORDER?
I can refer to a quote on the Microsoft documentation describing the new nifty feature the Power BI Direct Lake novelty.
Direct Lake mode introduces an innovative dataset capability for analysing large data volumes in Power BI. It enables loading parquet-formatted files directly from a data lake without the need for querying a Lakehouse endpoint or importing data into a Power BI dataset. With Direct Lake, data is loaded efficiently into the Power BI engine for swift analysis.
In other words, Power BI Direct Lake mode is kind of a direct query connection to your data lake, with the advantage that Power BI can swiftly load data without the overhead of translating a Power BI action into queries that are executed against data lakehouse. This performance is influenced by how your data is structured in the data lake. The Delta file format plays a crucial role in enhancing performance, but to achieve blazing-fast visuals in Power BI, the V-ORDER algorithm comes into play.
V-ORDER compresses and reorders data in your delta files in your data lakehouse, optimizing it for a seamless visualization experience in Power BI and other Microsoft Fabric workloads (like SQL endpoints). It provides an in-memory-like experience. The V-Order optimization will become standard for every newly created table in the Microsoft Fabric data lakehouse.
领英推荐
At this moment though, the algorithm is not available as an optimization technique in the Databricks workspace. But V-Order is 100 percent compliant with the open Delta format, so V-ORDER is fully compatible with Databricks as a processing engine, enabling optimization of delta files on data lakes using Microsoft Fabric and a Spark pool. This results in the fact that you can leverage Direct Lake mode on top of your gold layer, created by Databricks engines, and optimized by the Microsoft Fabric spark compute engine
Conclusion?
You can use Microsoft Fabric workloads to enable the ultimate Direct Lake mode experience on top of a data lakehouse powered by Databricks. Using the V-ORDER optimization technique, Direct Lake mode is supercharged, and you can use the performance to its fullest extent.