An approach to custom partitioning a Power BI Dataset and Refresh using the Enhanced Refresh API
A key feature to optimize your #powerbi dataset refresh is to partition your dataset tables, this will allow a faster & reliable refresh of new data simply because with partitions you can divide the table data into logical parts that can be managed & refreshed individually.
More information about dataset partitioning in this document.
You have two options to partition your tables:
In this article I'll demonstrate a simple but effective way to automatically create & manage partitions, how to refresh them individually and link the refresh operation to upstream ETL pipelines. This process may be easily adapted to your own scenario:
All the source code & samples can be downloaded from this github repo: RuiRomano/pbicustompartitioning (github.com)
We start from a simple Power BI Desktop file connecting to the Contoso sample database:
Let's start by publishing this dataset to a Premium workspace:
Using the workspace XMLA Endpoint you can use DAX Studio and View Metrics to inspect the default partitions:
My goal is to create monthly partitions to the "Sales" table so I can individually refresh each month, I have data since 2018 so I need to create 48 partitions where each partition will filter an individual month.
You can create these partitions manually using SSMS but that's a lot of work... A much efficient solution is to script this operation using PowerShell using the TOM Library, I created the script "Script - CreatePartitions.ps1" that will dynamically create all the partitions where each partition filters the appropriate month that will be folded to the database.
The cmdlet "Add-ASTablePartition" is responsible to create all the partitions that exists in a helper module I created to wrap the TOM Library: TOMHelper.psm1.
The M code above is obtained from the Advanced Editor in the Power Query window, for future changes you can develop/test in there and copy a new version to the script:
After you run the script all the partitions will be created:
Now that we have our monthly partitioning it's time to refresh and my favourite option to refresh a Power BI Dataset is using the Enhanced Refresh API mainly because it allows you to:
For me the most important feature is the "Granular table/partition refresh" because it allows you to batch your dataset refresh that means executing your refresh in multiple and smaller refresh commands and transactions that will allow you to:
领英推荐
For example, in our scenario we may want to create two refresh batches:
I'll show you two options to refresh the dataset using the Enhanced Refresh API and both can be easily reused and linked with other pipelines:
The PowerShell option is useful when you're dealing with on-prem pipelines or need more flexibility, I created the script Script - DatasetRefresh - API.ps1 as an example that can easily be adapted to your scenario. The script does the following actions:
If you are in the cloud, then Azure Logic App is a better option, its low code, easy to deploy, secure by default and it's very easy to design a workflow to refresh your dataset and link it to other upstream/downstream data pipelines.
I created a simple & generic logic app that:
In this scenario I created two logic apps with different configuration but same implementation:
Both logic apps could be triggered on different times and have different refreshing configurations, to change you just need to edit the "Refresh Parameters" action at the beginning of the flow:
The logic app gets triggered by an HTTP GET request, this means that to trigger your dataset refresh (facts or dimensions or both) you just need to execute an HTTP call from the upstream/downstream pipeline to the URL on the trigger action:
To create the logic app please follow the following steps:
Create a new Logic App on your Azure Subscription with the following parameters:
Open the "Logic App Code View" and copy the contents of the file LogicApp - Refresh.json
Enable the Managed Identity of the Logic App, this will automatically create a service principal for the logic app that will be the identity used to refresh the dataset:
You also need to authorize this Managed Identity to:
More information on Logic Apps Managed Identity here.
Finally, if you switch to the Logic App Designer you should be able to see the logic app workflow and change the "Refresh Parameters" action to point it to your dataset.
Analista de Dados Sênior @ AstraZeneca | Consultor em Cultura de Dados e BI | Especialista em SQL, Power BI, Data Warehouse e ETL | Certificado Microsoft | Green Belt
6 个月Hi Rui, awesome article. Let me ask you something... What about downloading the pbix? Is it possible? Cause i've tried using the partitioning using Tabular Editor, which uses the XMLA API and the pbix get's blocked for download after that.
Associate Vice President at Xoxoday - Expertise in setting up of BI and Reporting environment right from DWH set up to rolling out of dashboards in Startup landscapes
1 年Thanks Rui for the fantastic article. I feel every big fat Power BI dataset needs to be partitioned and has to be used as part of the incremental refresh where it saves so much of processing time and resources ??
Chief Innovation Officer @ Lumel
2 年Thanks for this excellent walkthrough. FYI your DAX Studio link is broken, it is now: https://daxstudio.org/docs/features/model-metrics/#partitions I'm working on a near-real time scenario to recursively ingest writeback data. It works great but the main bottleneck I see from the end user's perspective is that auto refresh for visuals only works with DQ sources, not Import. So the end user who's been working with a report for a while doesn't realize fresher data is available in the dataset but not currently displayed by the visuals. I'm curious how other people might have dealt with this.
Ben Arendse?Camille Sallé?premium inspo
Interesting, deep, and detailed flow.