An approach to custom partitioning a Power BI Dataset and Refresh using the Enhanced Refresh API

An approach to custom partitioning a Power BI Dataset and Refresh using the Enhanced Refresh API


A key feature to optimize your #powerbi dataset refresh is to partition your dataset tables, this will allow a faster & reliable refresh of new data simply because with partitions you can divide the table data into logical parts that can be managed & refreshed individually.

More information about dataset partitioning in this document.

You have two options to partition your tables:

In this article I'll demonstrate a simple but effective way to automatically create & manage partitions, how to refresh them individually and link the refresh operation to upstream ETL pipelines. This process may be easily adapted to your own scenario:

No alt text provided for this image

All the source code & samples can be downloaded from this github repo: RuiRomano/pbicustompartitioning (github.com)

We start from a simple Power BI Desktop file connecting to the Contoso sample database:

No alt text provided for this image

Let's start by publishing this dataset to a Premium workspace:

No alt text provided for this image

Using the workspace XMLA Endpoint you can use DAX Studio and View Metrics to inspect the default partitions:

No alt text provided for this image

My goal is to create monthly partitions to the "Sales" table so I can individually refresh each month, I have data since 2018 so I need to create 48 partitions where each partition will filter an individual month.

You can create these partitions manually using SSMS but that's a lot of work... A much efficient solution is to script this operation using PowerShell using the TOM Library, I created the script "Script - CreatePartitions.ps1" that will dynamically create all the partitions where each partition filters the appropriate month that will be folded to the database.

No alt text provided for this image

The cmdlet "Add-ASTablePartition" is responsible to create all the partitions that exists in a helper module I created to wrap the TOM Library: TOMHelper.psm1.

The M code above is obtained from the Advanced Editor in the Power Query window, for future changes you can develop/test in there and copy a new version to the script:

No alt text provided for this image

After you run the script all the partitions will be created:

No alt text provided for this image
No alt text provided for this image

Now that we have our monthly partitioning it's time to refresh and my favourite option to refresh a Power BI Dataset is using the Enhanced Refresh API mainly because it allows you to:

  • Asynchronously refresh your dataset without need to keep an active connection
  • Granular table / partition refresh
  • Control the Max Parallelism of the queries sent to the data sources
  • Cancel the Refresh

For me the most important feature is the "Granular table/partition refresh" because it allows you to batch your dataset refresh that means executing your refresh in multiple and smaller refresh commands and transactions that will allow you to:

  • Align your dataset table refresh with data availability on the source
  • Minimize the risk of running out of memory because you are not refreshing the entire dataset in a large transaction
  • Enable near-real time scenarios by refreshing a single partition with daily/hourly data

For example, in our scenario we may want to create two refresh batches:

  • Batch that only refresh the dimension tables with max parallelism
  • Batch that refreshes the last 3 months of data with a controlled max parallelism due to the large data volumes involved in the fact tables

I'll show you two options to refresh the dataset using the Enhanced Refresh API and both can be easily reused and linked with other pipelines:

  • PowerShell script using the MicrosoftPowerBIMgmt module
  • Azure Logic Apps - Preferred option due to its ease of creation and flexibility to link to other Azure Pipelines like Azure Data Factory, Azure Synapse Analytics, ...

The PowerShell option is useful when you're dealing with on-prem pipelines or need more flexibility, I created the script Script - DatasetRefresh - API.ps1 as an example that can easily be adapted to your scenario. The script does the following actions:

  • Declares two refresh batches Dimensions & Facts with different refresh configurations
  • For each batch executes the Enhanced Refresh API and waits for the execution to finish, because you cannot have more than one refresh command running in parallel

No alt text provided for this image

If you are in the cloud, then Azure Logic App is a better option, its low code, easy to deploy, secure by default and it's very easy to design a workflow to refresh your dataset and link it to other upstream/downstream data pipelines.

I created a simple & generic logic app that:

  • Is triggered by an HTTP Request
  • Has a parameter activity with JSON configuration to specify the refresh options (max parallelism, retry count, commit mode) and the tables/partitions you want to refresh
  • Executes the dataset refresh using the HTTP Action & Enhanced Refresh API
  • Wait for the Dataset Refresh to end

No alt text provided for this image

In this scenario I created two logic apps with different configuration but same implementation:

  • "PBI-CustomPartitioning-Dimensions" - Refresh all the dimension tables
  • "PBI-CustomPartitioning-Facts" - Refresh the last 3 months of the "Sales" table

Both logic apps could be triggered on different times and have different refreshing configurations, to change you just need to edit the "Refresh Parameters" action at the beginning of the flow:

No alt text provided for this image

The logic app gets triggered by an HTTP GET request, this means that to trigger your dataset refresh (facts or dimensions or both) you just need to execute an HTTP call from the upstream/downstream pipeline to the URL on the trigger action:

No alt text provided for this image

To create the logic app please follow the following steps:

Create a new Logic App on your Azure Subscription with the following parameters:

No alt text provided for this image

Open the "Logic App Code View" and copy the contents of the file LogicApp - Refresh.json

Enable the Managed Identity of the Logic App, this will automatically create a service principal for the logic app that will be the identity used to refresh the dataset:

No alt text provided for this image

You also need to authorize this Managed Identity to:

More information on Logic Apps Managed Identity here.

Finally, if you switch to the Logic App Designer you should be able to see the logic app workflow and change the "Refresh Parameters" action to point it to your dataset.

Alan Avelar

Analista de Dados Sênior @ AstraZeneca | Consultor em Cultura de Dados e BI | Especialista em SQL, Power BI, Data Warehouse e ETL | Certificado Microsoft | Green Belt

6 个月

Hi Rui, awesome article. Let me ask you something... What about downloading the pbix? Is it possible? Cause i've tried using the partitioning using Tabular Editor, which uses the XMLA API and the pbix get's blocked for download after that.

回复
Ravi Ramakrishnappa

Associate Vice President at Xoxoday - Expertise in setting up of BI and Reporting environment right from DWH set up to rolling out of dashboards in Startup landscapes

1 年

Thanks Rui for the fantastic article. I feel every big fat Power BI dataset needs to be partitioned and has to be used as part of the incremental refresh where it saves so much of processing time and resources ??

回复
Olivier Travers

Chief Innovation Officer @ Lumel

2 年

Thanks for this excellent walkthrough. FYI your DAX Studio link is broken, it is now: https://daxstudio.org/docs/features/model-metrics/#partitions I'm working on a near-real time scenario to recursively ingest writeback data. It works great but the main bottleneck I see from the end user's perspective is that auto refresh for visuals only works with DQ sources, not Import. So the end user who's been working with a report for a while doesn't realize fresher data is available in the dataset but not currently displayed by the visuals. I'm curious how other people might have dealt with this.

回复

Interesting, deep, and detailed flow.

回复

要查看或添加评论,请登录

Rui Romano的更多文章

社区洞察

其他会员也浏览了