Microsoft Fabric’s Semantic Link – Integration of Power BI into the ‘Circle of Life’
Source: https://learn.microsoft.com/en-us/fabric/data-science/semantic-link-overview

Microsoft Fabric’s Semantic Link – Integration of Power BI into the ‘Circle of Life’

Introduction

A lot of users know, love and use Power BI in their daily work. In addition to its amazing reporting capabilities, Power BI is also being used by more and more business departments to create own calculations in Power BI datasets (or “semantic models” as Microsoft now calls these Power BI datasets) via measures in Data Analysis Expressions (DAX).

Sample Power BI report for Marketing in Power BI Desktop

On the other hand, we have Data Engineers building data platforms (data warehouses or lakehouses) and Data Scientists building models (e.g. for forecasting) or using generative AI models with the help of services like Azure OpenAI (e.g. ChatGPT, DALL-E etc.). Today, Python is the leading programming language in Data Science and also plays an essential role in Data Engineering, especially since notebooks are becoming more and more important and are also a fundamental part of Microsoft Fabric - an end-to-end, unified analytics platform that brings together all the data and analytics tools that companies need which has been introduced by Arun Ulag in May 2023.

Ideally, all data is already available in one data platform, but at least the measures are by definition only available in a Power BI dataset and could be of interest to others.

So wouldn't it be great to bridge the gap between business users and other parties such as data engineers and data scientists, and make Power BI (meta) data easily accessible?

With the introduction of Semantic Link in Microsoft Fabric in October 2023, this is now easily possible. Why is this so cool? Because now the data generated in Power BI is no longer the end of the chain, but can be the beginning of a journey to new insights in Data Science models, or can be enriched in downstream processes without having to perform calculations again.

In addition, things like documentation and testing can be fully automated and customized to individual needs.

Semantic Link enables Power BI to become an active part of the "circle of life" between business users and data engineers and/or data scientists. ????????????

Want to learn more and how to use Semantic Link in Microsoft Fabric? Just read on.

What is Semantic Link?

Semantic Link provides Python methods for (read-only) access to Power BI datasets in a Microsoft Fabric workspace - both the metadata and the "real" data (including DAX metrics). Before Microsoft Fabric, this was already possible (to some extent) via Execute Queries as part of the Power BI REST API, but required some advanced IT skills and permissions in Azure. Now it becomes much easier. ??

The following illustration shows the interaction between Power BI datasets, Semantic Link (used in Fabric notebooks) and OneLake (the place where all data is stored in Microsoft Fabric).

Interaction Power BI, Semantic Link and OneLake

The functionalities of Semantic Link are separated into multiple Python packages:

  • semantic-link: The meta-package that contains all individual Semantic Link packages at once.
  • semantic-link-sempy: The package contains the core Semantic Link functionality.
  • semantic-link-functions-holidays: The package contains holiday functions.
  • semantic-link-functions-geopandas: The package contains functions for geospatial data.

The core data structure in Semantic Link is the FabricDataFrame which subclasses the pandas DataFrame and adds metadata like semantic information and lineage.

Semantic Link Data Structure

Use Cases

There are several ways Semantic Link can be used.

Read Data from a Power BI Dataset

The first use case is to read data from a Power BI dataset and use it for data engineering or data science purposes, as described earlier, since some data is only available in Power BI datasets (e.g., measures).

After a notebook is created in a Fabric workspace, the Semantic Link Python library must be integrated into the notebook.

Installation of Semantic Link

Then, the library methods can be used to retrieve data from the Power BI dataset (which must be accessible in a Fabric workspace), as shown in the following code snippets.

Read Data from a Power BI Dataset via Semantic Link

In these examples, measures, tables and data were retrieved via a DAX statement and can of course be further used in notebooks or even exported later (although it's not a good idea to export all data back to csv files... ?? ).

The above examples are all presented in PySpark. For those with a SQL background, it is also possible to use Semantic Link over SparkSQL. This is described in detail in a great blog post by Nikola Ilic .

Read Meta Data from a Power BI Dataset for Documentation

Not only the "real" data of a Power BI dataset can be accessed as described before, but also its metadata about tables, measures, relationships, etc.

Various list-methods to access meta data

The following example retrieves the metadata for measures in a dataset. This can be used, for example, to have documentation that is always up-to-date.

Notebook with Semantic Link to read meta data from a Power BI dataset

Create a Diagram of Tables and Dependencies

Another nice use case for documentation purposes is to create a diagram showing the dependencies between tables in a data set.

Dependencies between tables in a Power BI dataset

Detect functional Dependencies for Data Cleaning

Semantic Link can also be used to detect functional dependencies in a data set. A functional dependency exists when a column in a record is a function of another column. For example, a "Date" column determines the values in a "Month" column.

Detect functional Dependencies

Testing of Data in a Power BI Dataset

Another use case that can be easily implemented with Semantic Link is test automation, which has not been so easy to implement in the Power BI world.

In the following example, all tables of a data set are retrieved and relationship violations are evaluated (e.g. to detect foreign keys that are null).

Evaluation of relationships

The next example shows how tests can be automated by comparing the actual values in a data set with the expected values (here only by a simple "assert", but test frameworks can also be used).

Test automatization of Power BI values

Other Semantic Functions

Semantic Link provides a set of built-in semantic functions that are immediately available through the FabricDataFrame (internally, other Python packages such holidays, phonenumbers and GeoPandas are used).

Here is an example for validating holidays and phone numbers.

Validate holidays and phone numbers

In the following example, geodata from Hamburg and New York are visualized on a map.

Visualize Geo Data

Conclusion

Semantic Link is further evidence of how Microsoft Fabric is transforming the world of data, analytics and AI, and underscores its goal of making these disciplines more accessible to business users. This doesn't mean that every business user has to become a Data Engineer or Data Scientist, but sharing data between them will be much easier than ever before and business logic will no longer need to be duplicated. In addition, Semantic Link can be used for a variety of other use cases such as documentation, test automation, validation and more.

Want to learn more about Microsoft Fabric and Power BI with its amazing capabilities, or how you can build a data platform or leverage AI in your business (e.g. via Azure OpenAI)? Just contact us at Obungi or follow us on LinkedIn.

Torsten Wanka

Managing Partner / CEO at Obungi GmbH | Enabling organizations to gain business value in the cloud

1 年

Adam Saxton and Patrick LeBlanc from Guy in a Cube also published a video about Semantic Link an hour ago. They show some more interesting capabilities like the %%dax-magic command, using DMVs to check resident columns, updating the dataset, and some more insights into data validation. ?? Link: https://www.youtube.com/watch?v=zMiRGZsfQgs

回复
Nikola Ilic

I make music from the data??Data Mozart ??| MVP Data Platform | O'Reilly Author | Pluralsight Author | MCT

1 年

Great overview Torsten?? And, thanks for mentioning my article ??

Christopher Wagner, MBA, MVP

Director of Data & AI at Baker Tilly, Principal Data Insights & Analytics Solution Architect at KratosBI LLC, Fabric Administrator, Data god, Analytics Rebel, Power BI Boss, Santa Where Clause, and White Rabbit

1 年
Arthur Caron

Azure Data Engineer/Solution Architect at Euranova / Data Engineer Teacher at Le Wagon / Azure Solution Architect and Microsoft Fabric certified

1 年

要查看或添加评论,请登录

Torsten Wanka的更多文章

社区洞察

其他会员也浏览了