Microsoft Fabric’s Semantic Link – Integration of Power BI into the ‘Circle of Life’
Torsten Wanka
Managing Partner / CEO at Obungi GmbH | Enabling organizations to gain business value in the cloud
Introduction
A lot of users know, love and use Power BI in their daily work. In addition to its amazing reporting capabilities, Power BI is also being used by more and more business departments to create own calculations in Power BI datasets (or “semantic models” as Microsoft now calls these Power BI datasets) via measures in Data Analysis Expressions (DAX).
On the other hand, we have Data Engineers building data platforms (data warehouses or lakehouses) and Data Scientists building models (e.g. for forecasting) or using generative AI models with the help of services like Azure OpenAI (e.g. ChatGPT, DALL-E etc.). Today, Python is the leading programming language in Data Science and also plays an essential role in Data Engineering, especially since notebooks are becoming more and more important and are also a fundamental part of Microsoft Fabric - an end-to-end, unified analytics platform that brings together all the data and analytics tools that companies need which has been introduced by Arun Ulag in May 2023.
Ideally, all data is already available in one data platform, but at least the measures are by definition only available in a Power BI dataset and could be of interest to others.
So wouldn't it be great to bridge the gap between business users and other parties such as data engineers and data scientists, and make Power BI (meta) data easily accessible?
With the introduction of Semantic Link in Microsoft Fabric in October 2023, this is now easily possible. Why is this so cool? Because now the data generated in Power BI is no longer the end of the chain, but can be the beginning of a journey to new insights in Data Science models, or can be enriched in downstream processes without having to perform calculations again.
In addition, things like documentation and testing can be fully automated and customized to individual needs.
Semantic Link enables Power BI to become an active part of the "circle of life" between business users and data engineers and/or data scientists. ????????????
Want to learn more and how to use Semantic Link in Microsoft Fabric? Just read on.
What is Semantic Link?
Semantic Link provides Python methods for (read-only) access to Power BI datasets in a Microsoft Fabric workspace - both the metadata and the "real" data (including DAX metrics). Before Microsoft Fabric, this was already possible (to some extent) via Execute Queries as part of the Power BI REST API, but required some advanced IT skills and permissions in Azure. Now it becomes much easier. ??
The following illustration shows the interaction between Power BI datasets, Semantic Link (used in Fabric notebooks) and OneLake (the place where all data is stored in Microsoft Fabric).
The functionalities of Semantic Link are separated into multiple Python packages:
The core data structure in Semantic Link is the FabricDataFrame which subclasses the pandas DataFrame and adds metadata like semantic information and lineage.
Use Cases
There are several ways Semantic Link can be used.
Read Data from a Power BI Dataset
The first use case is to read data from a Power BI dataset and use it for data engineering or data science purposes, as described earlier, since some data is only available in Power BI datasets (e.g., measures).
After a notebook is created in a Fabric workspace, the Semantic Link Python library must be integrated into the notebook.
Then, the library methods can be used to retrieve data from the Power BI dataset (which must be accessible in a Fabric workspace), as shown in the following code snippets.
In these examples, measures, tables and data were retrieved via a DAX statement and can of course be further used in notebooks or even exported later (although it's not a good idea to export all data back to csv files... ?? ).
领英推荐
The above examples are all presented in PySpark. For those with a SQL background, it is also possible to use Semantic Link over SparkSQL. This is described in detail in a great blog post by Nikola Ilic .
Read Meta Data from a Power BI Dataset for Documentation
Not only the "real" data of a Power BI dataset can be accessed as described before, but also its metadata about tables, measures, relationships, etc.
The following example retrieves the metadata for measures in a dataset. This can be used, for example, to have documentation that is always up-to-date.
Create a Diagram of Tables and Dependencies
Another nice use case for documentation purposes is to create a diagram showing the dependencies between tables in a data set.
Detect functional Dependencies for Data Cleaning
Semantic Link can also be used to detect functional dependencies in a data set. A functional dependency exists when a column in a record is a function of another column. For example, a "Date" column determines the values in a "Month" column.
Testing of Data in a Power BI Dataset
Another use case that can be easily implemented with Semantic Link is test automation, which has not been so easy to implement in the Power BI world.
In the following example, all tables of a data set are retrieved and relationship violations are evaluated (e.g. to detect foreign keys that are null).
The next example shows how tests can be automated by comparing the actual values in a data set with the expected values (here only by a simple "assert", but test frameworks can also be used).
Other Semantic Functions
Semantic Link provides a set of built-in semantic functions that are immediately available through the FabricDataFrame (internally, other Python packages such holidays, phonenumbers and GeoPandas are used).
Here is an example for validating holidays and phone numbers.
In the following example, geodata from Hamburg and New York are visualized on a map.
Conclusion
Semantic Link is further evidence of how Microsoft Fabric is transforming the world of data, analytics and AI, and underscores its goal of making these disciplines more accessible to business users. This doesn't mean that every business user has to become a Data Engineer or Data Scientist, but sharing data between them will be much easier than ever before and business logic will no longer need to be duplicated. In addition, Semantic Link can be used for a variety of other use cases such as documentation, test automation, validation and more.
Want to learn more about Microsoft Fabric and Power BI with its amazing capabilities, or how you can build a data platform or leverage AI in your business (e.g. via Azure OpenAI)? Just contact us at Obungi or follow us on LinkedIn.
Managing Partner / CEO at Obungi GmbH | Enabling organizations to gain business value in the cloud
1 年Adam Saxton and Patrick LeBlanc from Guy in a Cube also published a video about Semantic Link an hour ago. They show some more interesting capabilities like the %%dax-magic command, using DMVs to check resident columns, updating the dataset, and some more insights into data validation. ?? Link: https://www.youtube.com/watch?v=zMiRGZsfQgs
I make music from the data??Data Mozart ??| MVP Data Platform | O'Reilly Author | Pluralsight Author | MCT
1 年Great overview Torsten?? And, thanks for mentioning my article ??
Director of Data & AI at Baker Tilly, Principal Data Insights & Analytics Solution Architect at KratosBI LLC, Fabric Administrator, Data god, Analytics Rebel, Power BI Boss, Santa Where Clause, and White Rabbit
1 年I like this better: https://youtube.com/shorts/7K_QMo6dcEc?si=n608uW2nDuAMCwer
Azure Data Engineer/Solution Architect at Euranova / Data Engineer Teacher at Le Wagon / Azure Solution Architect and Microsoft Fabric certified
1 年Thomas Coppois