登录查看更多内容

Deep dive into Microsoft Fabric

Mateusz Sawicki

I'm not another AI expert but I'm pretty good at data engineering ???

发布日期: 2023年6月5日

Introduction?

During Microsoft Build 2023 conference, Fabric was introduced. It is new end-to-end analytics platform offered by Giant of Redmond. The service offers the most comprehensive analytical capabilities – from ETL, through data modelling and visualization up to machine learning models and AI. After the launch of this product social media were on fire. Thousands of data professionals, including Microsoft MVPs, started to publish content about this new hot topic. Even I wrote some posts about it, one of them beign OneMeme. For anyone who has not seen this – OneMeme is a meme explaining the concept of Microsoft Fabric.

No alt text provided for this image — OneMeme

It is an obvious reference to OneLake which is the foundation of Fabric. When I published my OneMeme on LinkedIn, I also promised that I would write an article about Microsoft Fabric. In this publication, I am going to explain what Fabric is, its capabilities and advantages and why you should learn this tool as soon as possible. All thoughts and insights are being made from the perspective of business intelligence professional, so I focus mostly on ETL, data modelling and visualization. Just due to the fact I have relevant experience in those areas.??

Why the new tool??

It is a question that many asked when Microsoft released Fabric. My initial feelings towards it were doubts. I couldn't understand why they were introducing a new data analytics platform when Power BI, Azure Synapse, and all the other Azure data services were already available. After overcoming my initial hesitations, I delved into the realm of Fabric and began to understand that it made sense and could bring value to businesses. The most crucial concept behind Fabric is to unify Microsoft's analytical services in one place. Anyone who has worked with the Azure data platform and Power BI knows that building an analytical solution can be architecturally challenging.?

Designing an analytical solution on Azure can be truly overwhelming. Developers and architects have to determine which services will be necessary, how to integrate them, and consider the cost implications. These challenges are precisely what Fabric addresses. Firstly, it provides a unified platform where all the necessary building blocks are readily available. There's no need to spend time figuring out which services are needed. Secondly, the elements are automatically connected, eliminating the need for developers to create identities, users, and assign permissions, among other tasks. Anyone who has attempted this knows how frustrating it can be, particularly in larger enterprises where decision-making can be an ongoing and arduous process. Thirdly, OneLake enables users to store their data in one place, reducing data replication and the duplication of similar tasks. Fourthly, data governance is much more effective when all users use one service. Additionally, Fabric offers some excellent governance capabilities. Fifth, Fabric pricing is straightforward – users only have to pay for capacity and Power BI licenses (the price list is available on Microsoft's website), which are easy to estimate, along with storage, which is negligible for most companies. In the next few paragraphs, I will elaborate on each of the aforementioned advantages of Microsoft Fabric.?

All-in-one tool?

Microsoft Fabric is a complete analytics solution tailored for businesses. It offers a broad spectrum of services like data factory, data science, real-time analytics, and business intelligence, all under one roof. The goal of this platform is to streamline analytics needs by offering a unified solution, thereby avoiding the need to combine separate services from multiple vendors. Fabric is built on the backbone of Software as a Service (SaaS), which makes integration straightforward. Microsoft Fabric combines elements from Power BI, Azure Synapse, and Azure Data Explorer into a single cohesive environment, offering a full suite of analytics experiences designed to work in tandem.?

Data Engineering allows developers to build what's known as a data lakehouse. It's founded on Apache Spark, and the native data format in Fabric is delta (you can learn more about delta format in this article by Nikola Ili?), which aligns perfectly with data lakehouse concepts. There are notebooks available where you can write code to interact with Spark in Python, SQL, R, or Scala, and these can be scheduled with ease.?

Data Factory is a tool recognized by Azure data engineers for handling ETL/ELT workloads. It enables users to create data integration easily.?

Data Science helps build, deploy, and manage machine learning models. It links with Azure Machine Learning to offer built-in experiment tracking and a model registry.?

Data Warehouse corresponds to Azure Synapse Dedicated SQL Pool, a contemporary cloud data warehouse utilizing MPP architecture. It's little funny because the data in the warehouse "tables" is stored in delta file format, which is a characteristic feature of a lakehouse. Nevertheless, this tool lets developers construct a data warehouse that behaves like a relational database, where data is orderly stored in tables.?

Real-Time Analytics is an engine designed for observational data analytics. It can handle data sourced from various platforms such as apps, IoT devices, and human interactions.?

领英推荐

BUILD 2024: Agents & multi-modal apps are the tip of…

Snowflake 4 个月前

Announcing Shakudo – the modern data solution I wish I…

DJ Patil 1 年前

Plotly Newsletter, December 2024

Plotly 3 个月前

Power BI is the foundation for Fabric's interface. It is well-known for its data modeling and visualization capabilities.?

Services connected under the hood?

In Microsoft Fabric, all services are interconnected by default. There's no need to struggle with assigning sufficient permissions between services as is the case with Azure. Developers are spared from setting up managed identities, VNets, and other boring administrative tasks. When you utilize Fabric, you can set aside infrastructure concepts like resource groups, RBAC (Role-Based Access Control), Azure Resource Manager, redundancy, or regions. That being said, nowadays it's still necessary to be familiar with these concepts. Not all companies will transition to Fabric immediately, and proficient developers should understand how to handle Azure workloads.?

One data storage?

Microsoft Fabric is built upon the foundational layer of OneLake, which serves as a repository for the data. OneLake, constructed atop Azure Data Lake Storage Gen2, eliminates the necessity of creating this resource in Azure—in fact, there's no need for even having an Azure account to operate within Fabric. The Fabric license encompasses OneLake to store all data used in a given tenant. This methodology tries eliminating data silos, a ubiquitous problem in many organizations. However, the real game-changers introduced by OneLake are the shortcuts. In Microsoft OneLake, shortcuts empower developers to consolidate data across domains, clouds, and accounts, thus creating a single virtualized data lake for the entire enterprise. All Fabric tools can connect directly to existing data sources such as ADLS, S3, and of course OneLake via a unified namespace. OneLake manages permissions and credentials, so each Fabric experience doesn't require separate configuration for each data source. Shortcuts are objects in OneLake that link to other storage locations, which can be internal or external to OneLake. They behave like symbolic links and operate independently of their target. If a shortcut is deleted, the target remains unaffected; conversely, if the target path is altered or deleted, the shortcut may break.??

OneLake's capabilities don't end with shortcuts. Another standout feature is DirectLake. DirectLake allows for the loading of parquet-formatted files directly from a data lake, bypassing the need to query a lakehouse endpoint or import or duplicate data into a Power BI dataset. It offers a quick route to load data from the lake directly into the Power BI’s VertiPaq for analysis. DirectLake eliminates the import process by loading data straight from OneLake. Diverging from DirectQuery, it doesn't translate to other query languages or execute queries on other database systems, thus delivering performance akin to import mode. As there's no explicit import process, any changes at the data source can be immediately reflected, effectively merging the benefits of both DirectQuery and import modes while circumventing their drawbacks. DirectLake can be an optimal choice for analyzing sizable datasets and datasets with frequent updates at the data source. This feature is supported by V-Order, a write-time optimization for the parquet file format that enables swift reads under Microsoft Fabric compute engines such as Power BI, SQL, Spark, and others. V-Order achieves this by applying special sorting, row group distribution, dictionary encoding, and compression on parquet files, thereby reducing the requirement for network, disk, and CPU resources in compute engines to read it, resulting in cost-efficiency and improved performance. While V-Order sorting can impact write times by an average of 15%, it offers up to 50% more compression.?

Data governance with domains?

Power BI already boasts an impressive array of data governance capabilities such as data sensitivity labels, workspace permissions, row-level security, Purview, among others. The Purview hub is also an integral element of Fabric, offering centralized administration and governance across all experiences, with permissions automatically applied across all the underlying services. Data sensitivity labels are likewise automatically inherited across the suite's items. While these features are invaluable, a new element known as 'domains' has recently emerged. A domain serves as a logical grouping mechanism for all data relevant to a specific area or field within an organization. In order to organize data into domains, workspaces are linked with specific domains. Once a workspace is connected with a domain, all items within that workspace also become associated with that domain, receiving a domain attribute as part of their metadata. Domains offer a higher level of workspace grouping that can be used, for instance, to create development, test, and production environments or to partition data among different organizational units within an enterprise. This feature enables organizations to manage their data in accordance with their unique regulations, restrictions, and requirements. ?

Simplified pricing?

Anyone who's had the opportunity to develop an analytical solution on Azure knows that navigating the pricing structure for various services can be quite complex. In fact, I can assert without hyperbole that crafting cost-effective solutions on Azure is somewhat of an art form. If someone can accurately predict the cost of an analytical solution on Azure, they're likely a veritable maestro of the Azure Data Platform. Microsoft Fabric significantly simplifies this pricing puzzle. Its licensing model aligns with that of Power BI premium or embedded capacities. Costs are incurred for capacity and storage in OneLake. Capacities are available for purchase via the Azure portal, offering flexibility to enterprises through pay-as-you-go hourly or monthly options. This is particularly convenient for users who do not require round-the-clock access. In the forthcoming months, Microsoft plans to augment the capabilities of Fabric capacities, including the introduction of Azure Reservations. Similar to features found in services like Synapse, reservations can lead to cost reductions, making it worth considering booking capacity for extended periods. As for OneLake storage pricing, it is comparable to Azure ADLS (Azure Data Lake Storage) where users pay per GB, which is often negligible. This model is easy to estimate and can save accountancy department from dealing with unexpected, exorbitant bills.?

However, one aspect of Fabric's pricing I find less appealing is the division of paid capacity. When you purchase capacity, you receive a set number of vCores, divided equally between front-end and back-end tasks. In situations of high-demand workloads, such as data warehousing or ML tasks, you may need additional back-end capacity without impacting front-end operations. This can lead to higher costs than equivalent solutions built with other services.

Some reasonable conclusions?

This article may come across as a one-sided celebration of a shiny new tool and in all honesty, it is. I'm a huge fan of Microsoft Fabric, an all-in-one analytical solution for enterprises. I've explored numerous features and facilitations that Fabric offers in comparison to Azure services. Don't get me wrong; I hold a deep admiration for Azure, particularly for Synapse and ADF. They're modern, comprehensive solutions that will be used for a long time, given the significant investments enterprises have made in them to become data-driven organizations.?We have to remember, however, that Fabric is built on top of Azure services and simplifies their use. While I criticized the complicated pricing structure of Azure in the previous section, I want to clarify that many projects might be more cost-effective to implement in the traditional Azure way rather than using Fabric. For instance, building a lakehouse with Synapse Serverless or Databricks is likely to be more affordable than implementing the same solution with Fabric—and you don't need to be an Azure pricing magician to realize this. There are likely numerous other solutions that could be built without Fabric to meet an enterprise's data requirements.?My motivation in writing this article was to highlight the strengths of this new tool, consolidate my knowledge, and encourage readers to familiarize themselves with Fabric.?

要查看或添加评论，请登录

Mateusz Sawicki的更多文章

Building resilient and reliable data pipelines

2024年3月17日

Building resilient and reliable data pipelines

In my recent LinkedIn Post, I emphasized a bunch of rules for building resilient and reliable data pipelines. My…

1 条评论
PowerBI/Fabric REST API unleashed!

2023年8月8日

PowerBI/Fabric REST API unleashed!

My previous article looked at REST APIs in a general sense and signalled some topics of interest to Power BI…
REST API explained with examples for Power BI developers

2023年8月1日

REST API explained with examples for Power BI developers

Introduction If you've ever had to deal with web services, it's highly likely you've come across the term REST API…

13 条评论
How to learn effectively?

2023年7月1日

How to learn effectively?

Back in my school days, teachers, parents, and grandparents dished out advice about how to study and pass all exams…

4 条评论
O co chodzi z tym Azure Synapse Analytics?

2023年5月2日

O co chodzi z tym Azure Synapse Analytics?

W poprzednich artyku?ach wielokrotnie wspomina?em o Azure Synapse Analytics. Jest to nowoczesna platforma, któr?…
Jak obni?y? koszty korzystania z Azure Storage Account?

2023年3月31日

Jak obni?y? koszty korzystania z Azure Storage Account?

Zainspirowany ostatnim pytaniem dotycz?cym kosztów Azure Storage Account pod tym artyku?em, postanowi?em napisa? na ten…
Apache Spark - po co to komu?

2023年3月28日

Apache Spark - po co to komu?

S?owo wst?pu Zrozumienie czym jest Apache Spark jest niezb?dne przede wszystkim do tego, ?eby zrozumie?..
Jakie us?ugi analizy danych mo?na znale?? w chmurze Azure?

2023年3月26日

Jakie us?ugi analizy danych mo?na znale?? w chmurze Azure?

Analiza danych, big data, data science, machine learning czy w końcu artificial intelligence to jedne z najgor?tszych…

3 条评论
Jak szybko zacz?? z Microsoft Azure?

2023年3月19日

Jak szybko zacz?? z Microsoft Azure?

Prawdopodobnie wszyscy z nas s?yszeli co? na temat chmury, a mówi?c precyzyjnie – us?ug chmurowych. W ostatnich latach…
Narz?dzia ETL, cz. 2

2023年3月15日

Narz?dzia ETL, cz. 2

Niniejeszy artyku? jest kontynuacj? opublikowanego 13.03.

7 条评论

See all articles

Deep dive into Microsoft Fabric

Mateusz Sawicki

I'm not another AI expert but I'm pretty good at data engineering ???

Introduction?

Why the new tool??

All-in-one tool?

领英推荐

Services connected under the hood?

One data storage?

Data governance with domains?

Simplified pricing?

Some reasonable conclusions?

Mateusz Sawicki的更多文章

社区洞察

其他会员也浏览了

InstaQuery Helps Rewrite Snowflake Queries with AI

THE Strongest Link! 5 Reasons why Semantic Link IS the Fabric big deal

Data Science Prowess in Microsoft Fabric

Plotly Newsletter, September 2024

Databricks Data+AI Summit 2024: The headlines – and what you might have missed

Faster AI, Lower Latency with Iceberg Databases

Data Warehousing is Dead

Top Analytics News for the Month- #1

InstaQuery Helps Rewrite Snowflake Queries with AI

Latest Microsoft Fabric updates that can help you in 2025.

Introduction?

Why the new tool??

All-in-one tool?

领英推荐

Services connected under the hood?

One data storage?

Data governance with domains?

Simplified pricing?

Some reasonable conclusions?

Mateusz Sawicki的更多文章

Building resilient and reliable data pipelines

PowerBI/Fabric REST API unleashed!

REST API explained with examples for Power BI developers

How to learn effectively?

O co chodzi z tym Azure Synapse Analytics?

Jak obni?y? koszty korzystania z Azure Storage Account?

Apache Spark - po co to komu?

Jakie us?ugi analizy danych mo?na znale?? w chmurze Azure?

Jak szybko zacz?? z Microsoft Azure?

Narz?dzia ETL, cz. 2

社区洞察

其他会员也浏览了

InstaQuery Helps Rewrite Snowflake Queries with AI

THE Strongest Link! 5 Reasons why Semantic Link IS the Fabric big deal

Data Science Prowess in Microsoft Fabric

Plotly Newsletter, September 2024

Databricks Data+AI Summit 2024: The headlines – and what you might have missed

Faster AI, Lower Latency with Iceberg Databases

Data Warehousing is Dead

Top Analytics News for the Month- #1

InstaQuery Helps Rewrite Snowflake Queries with AI

Latest Microsoft Fabric updates that can help you in 2025.