Microsoft Fabric- SaaSification of cloud analytics
During Microsoft BUILD 2023, Microsoft Fabric was announced as public preview. Microsoft Fabric is an evolution from existing Microsoft analytics products including Azure Synapse Analytics, Azure Data Factory, and Microsoft Power BI into a further simplified SaaS cloud analytics suite. It enhances and unifies experience of data integration, data engineering, data science, data warehousing, real-time analytics, business intelligence, and applied observability. It empowers every user to build an open and lake-centric unified data estate in a simplified SaaS experience. The following diagram provides the high-level concepts of Microsoft Fabric.?
In our blog we will focus on the new capabilities that Microsoft Fabric brings in the cloud analytics field. This article aims to be a one-stop consolidation of all such new capabilities of Microsoft Fabric. Overall Fabric capability brings the following advantages as part of SaaS experience.
Note that during the time of writing this blog, Data activator is in preview and rest all features listed earlier are available as public preview. We will now look at the elevated level of the concept of previously mentioned points in the following section.
OneLake capability-- One data lake for organization
As per the Fabric public documentation, OneLake is defined as a single, unified, logical data lake for the whole organization. OneLake is one single lake for the entire organization, providing unified experience across multiple analytical engines. Fabric OneLake is automatically provisioned with Microsoft Fabric tenant and hence the setup and management of Lakehouse are not any more customers overhead. OneLake aims to eliminate the overhead of managing multiple data lakes within same org. Native fabric computations like Lake house, data warehouse, data explorer stores the file in OneLake in Delta parquet format. The following picture depicts how under the hood OneLake maintains data residency, however at same time different workspaces distributed across multiple country are still attached with same logical OneLake.
OneLake capability-- One drive like windows explorer experience ?
Every developer and business user can consume and analyze collaboration with organizational data as easily as office one drive application. OneLake acts as one drive of organization. The following screenshot provides glimpses of OneLake which comes with Windows explorer capability making data exploration significantly simpler for citizen users.?
Note that One Lake capability is built on top of ADLS Gen2 and compatible with ADLS Gen2 SDK. OneLake comes with openness and for data engineers and comes with API as shown below so that they can seamlessly access it from Azure Databricks, HDInsight, and other platforms. https://onelake.dfs.fabric.microsoft.com/<workspaceGUID>/<itemGUID>/Fabricdata/Fabrictest.csv.
OneLake Shortcut - Attaching multi cloud files like Desktop ‘Create Shortcut’ Files
Shortcut comes with symbolic link that points one data location to another file located in remote location. It functions like a metastore, and the remote file appears as shortcut from current location. In this process neither physically copied data, nor secondary copy is maintained. At the same time data changes reflect automatically inside Fabric workspace. We can establish this symbolic link with filesystem sitting across other clouds, for example AWS. By doing that we create multi cloud data lake, at the same time reduce data copying and duplicity while dealing with federated files across Azure data lake gen2, AWS S3. This feature is available inside Fabric Lakehouse and Fabric KQL Database currently. We can leverage this short cut, execute SQL, and Spark scripts like a virtualized layer. From developer’s experience perspective it is quite similar like ‘Create Desktop shortcut’ for a File and Folder. The following screenshot shows how shortcut features within Fabric workspace works.
Data mesh Inbuilt – Data domain allocation?
A data mesh is a decentralized framework and helps to organize the data. Like service mesh, data mesh also follows a decentralized framework to support different business units and departments an. Microsoft fabric OneLake brings domain-based data management capability and brings efficient data management capability across multiple business groups. Data owners can associate their area of interest with specific data domain seamlessly inside Fabric workspace and thus can leverage native data mesh as service capability. This domain assignment features are available inside OneLake data hub as shown in following screenshot.
Intuitive Fabric UX- Office 365 like user experience.
As part of its SaaSification journey, the new Microsoft Fabric brings simplified user experience. Fabric comes with a persona-based experience. Data scientists, Data engineers, ETL developers, Business intelligence users enjoy such intuitive user experience based on their tool’s requirement. The following picture shows the landing page experience of Microsoft Fabric
Auto-create visualization capability- One click Insight from data.
Generating insights from dataset is one step easier now for citizen users. We can click on the dataset within Fabric workspace and Fabric will create meaningful visualization (and narrative) using Power BI capability. This is done using ‘Auto-create’ Visualize feature as shown in the following screenshot. This method does not need any development effort. Business users also can choose options like ‘+Create from Scratch’ to build desire reports over using Power BI based intuitive way.
World class Spark - Autotuning, fast spark initiations (few seconds), high concurrency
Microsoft Fabric spark comes with advanced spark capabilities for better speed and efficiency. It starts with runtime version 1.1 which comes with Delta Lake 2.2 and Spark 3.3(3.3.0 & 3.3.1) versions. Fabric spark provides V-order write optimization which enables faster read for parquet files. Workspace provides spark starter pool capability which enables fast spark session initiations with its always on/pre hydrated cluster pool. Fabric spark autotuning features help to automatically tune the workload for better efficiency and less manual tuning overhead. Dynamic reserve based throttling capability and queuing limit (FIFO) provides optimized concurrency to organization.
Power BI Direct Lake – No loading, high performant big data visualization.
Fabric introduces an innovative way of Power BI to analyze and visualize large datasets. It avoids loading data inside Power BI engine like import mode. Also, we do not need to use Direct query in the Lakehouse endpoint which might create slowness. Fabric instead leverages Direct Lake mechanism to load the parquet file directly in Power BI engine via direct lake dataset. Also, any changes to Lakehouse side are available readily in Power BI memory, thus providing high performance, no latency without importing the data set explicitly. It is the best breed of Import & Direct query mode that we have today with Power BI. The following picture from Microsoft documentation show how does the all the Power BI capability works.
Intuitive extraction, transformation, loading experience for Office users.
Microsoft Fabric brings both Power Query based editor Data flow gen2 and azure data factory experience inside its workspace. Data flow gen provides zero coding extract, transformation, load editor tool in simplified way. This mechanism empowers office users to generate actionable insights from data using intuitive ETL experience provided by Power Query. The following picture depicts simplicity of Data flow gen2 user experience inside Fabric workspace.
Capacity simplified- Purchase and leverage of the same capacity all workloads.
Purchase of computing capacity is simplified significantly and does not inherit multiple cost line item’s structure. Fabric billing comes with a single computing cost, single storage cost (OneLake cost) and bandwidth cost. Purchasing is as easy as buying a PC with our preferred CPU Cores. The capacity that we buy is shared across workload, users, and projects. However, capacity management is an auto pilot thus eliminating workload management. Unlike dedicated capacity, its controlled unbounded capacity, thus brings capability like automatic load stabilization (eliminate load spike), preventing single users from ad hoc extreme hogging.
We can provision Microsoft fabric easily using Microsoft PowerBI premium per capacity or using Microsoft fabric capacity using Azure portal. Note that all the capacity sits under Microsoft Fabric tenants. The hierarchy of Fabric tenant, capacity and workspace is shown in the following diagram.
????????????????????
?Fabric capacity metrics?application for simplified cost management
Fabric computation provides prebuilt capacity metrics applications which users can download and start using natively inside Fabric workspace. This capacity metrics application helps to simplify cost management, help to do detail level cost slicing and dicing and effective charge back management. We can effectively measure capacity units, performance matrix, operations related details in this application. To download this application, please refer to the read further section.???
领英推荐
Complete decoupled storage and computation in Datawarehouse
Fabric data warehouse comes with complete decoupled storage and computation. Thus, new Fabric DW doesn’t need separate loading inside data warehouse for high performant SQL operations. At the same time due to complete decoupling of storage, Fabric DW brings more openness and deeper compatibility with all supported storage sources.
SQL Endpoint of Lakehouse- Lake view of the Lake house
Fabric lake house brings a read only SQL endpoint for Querying the Delta Lake. It’s basically SQL View of the Lakehouse. Thus, we can use Lakehouse using our familiar toolsets like SQL Server management studio, Azure data studio and many more.
Fabric Visual QUERY Editor- Visual interface for writing the query.
Fabric workspace comes with Visual query editor inside the object explorer canvas. Thus, visual query editor inherits Power Query capability and provides No code visual editor for Query developer.
Virtual data warehouse-Cross DW query within Fabric workspace??????
Fabric workspace allows developers to query across multiple Fabric data warehouses. We can leverage SQL editor or Visual editor features to do cross data warehouse query inside Fabric workspace. This capability is available for both Fabric data warehouse endpoint and Fabric Lake house SQL endpoint. The following picture shows new visual query capability that we can use for code free cross database query.
???Microsoft Purview Hub- Insight of Fabric data inside Fabric??
Purview is more tightly integrated inside Fabric workspace as Microsoft Purview hub. It provides detailed insights of Fabric data estate and contains detailed information like sensitivity information of artifacts, Inventory reports, distribution of Items, data loss prevention monitoring and many more details. The following screenshot depicts Microsoft Purview hub capability inside Fabric workspace.
?Data activator- Code free real time alert configuration (In preview)
Fabric’s Data Activator provides no code experience within the workspace to build actions from data. Citizen users can follow simple steps that are involved in this Data Activator capability.
·??????Data connectivity as shown in the following diagram Data Activator capability can connect with EventHub, Power BI dataset and many more sources.
·??????Configure code free actionable alertà Data activator provides visual interface to configure condition based on which it can monitor and detect any pattern.
·?????Trigger action when condition detectedà Data activator provides a trigger designer to define trigger-based action like send Team’s notification.?
?Data wrangler-Code free experience within notebook for data exploration
Data wrangler user interface is a new experience inside Fabric notebook that provides code free data wrangling experience for data scientists. We can select to any python data frame populated inside notebook, subsequently switch to the data wrangling interface, code free data wrangling using the data frame data. It includes common features like visualization, dynamic statistics, and data cleaning operation. All such steps automatically generate equivalent code for notebook which we can append in notebook cell for further development.
ML experiment and model tracking user interface
During Public preview time, Fabric workspace machine learning capabilities are focused on experimentation and Model management. Workspace provides built-in models and experiment mechanisms. Machine learning experiment tracking capability is now available inside Fabric as User interface. This is built on ML Flow framework. Using this interface Data scientist can track the experiment related details like the name of the notebook, run time, hyperparameters, run id and few more details as shown in following screenshot. ML Experiment user interface also provides the ability to visually compare different experiment runs and help conclude data scientist on best run details.
Like experimentation, fabric workspace also provides model user interface which provides model version name, version created time, last modified time, created by username, experiment name that created the model, run name. Additionally visual comparison of different models is also inbuilt with the model version management capability. The following screenshot shows model version management capability inside Fabric workspace.
?Kusto QL magic extension for Notebook
Microsoft Fabric notebook comes with KQL magic extension and simplifies Kusto development. Using this capability, we can run Kusto Query natively inside Notebook. Thus, leverage rich Kusto query capability and python capability inside same notebook is feasible with new KQL magic extension. Following code snippet helps Notebook to install Kql magic extension.
Import the KQL magic package via Pypi to enable connectivity to the KQL Database
! pip install Kqlmagic --no-cache-dir --upgrade
Load the package to memory.
reload_ext Kqlmagic
Learn more about previously mentioned topics in the following Further reading section. Based on new capability/feature availability, enhancements, and feedback we will continue to update this thread.?
Further reading?
To read more on the topics covered in this chapter, you can refer to the following resources:
Global Black Belt Analytics | Solutions Sales Specialist | Azure | Data & AI
1 年Comprehensive round up of Microsoft Fabric Well done Debananda Ghosh
Americas Sales Lead - Azure Digital Apps and Innovation
1 年Nice job Debananda Ghosh !
Empowering digital transformation in FinTech with Azure OpenAI, Microsoft Copilot, Fabric, and Databricks
1 年This is an Amazing recap of Fabric and it’s value proposition. Thank you!
Senior Global Black Belt - Data & AI, Microsoft
1 年Very clear and insightful Debananda!