Power Platform Random Tip #4                               
Get to grips with cardinality when it comes to Power BI

Power Platform Random Tip #4 Get to grips with cardinality when it comes to Power BI

When your working with Power BI (in import mode), having a fundamental understanding of the vertipaq engine will give a insight into the factors that effect your data model size. There's a great article here you could read by the legends Marco Russo and Alberto Ferrari that explains what the vertipaq engine is and the factors effecting compression. I strongly reccpmened purchasing both the Definitive Guide to DAX 1st and 2nd edition if you want to get an in depth understanding of how Power BI works under the hood.

This article will focus on the steps you can take to reduce the size of your data model by removing or reducing your cardinality.

Reducing or, even better, removing cardinality will significantly impact your data model size and give you the most bang for your buck when it comes to reducing your data model size. The golden rule when working with Power BI is only to import what you need, it sounds simple, but I've seen many data models grind to a halt because someone thought it was a good idea to bring in all the columns on the off chance the column are required in the future.

Take a close look at the PBIx files ordered by PBIx size below. Shoutout to Kris Coupe for the idea ; it's a great way to visualize the importance and impact of cardinality.

No alt text provided for this image

Key takeaways:

  • The PBIx with the most unique (cardinality) records is the biggest PBIx
  • Minimal compression occurred when using the Date Time data type (compare excel to PBIx)
  • Big compression gains when converting the Date Time data type to the Date data type
  • The table with 30 million records and the lowest cardinality is the smallest PBIx

Hopefully the above provided you with a insight into how cardinality has an significant impact on your data model. I've listed some examples of the steps you can take with your data model to reduce cardinality which will likely result in a reduction in your data model size. I'n most cases you will see a ripple effect of performance gains when it comes to refreshing and visuals query performance.

I strongly suggest that test any of the recommendations listed below before deploying to production to ensure you get the expected output and performance.

1) Remove columns you do not need

  • Remove all columns that are not used to build relationships, create DAX measures, or create visuals.
  • For fact tables, consider replacing pyshical columns with DAX functions. E.g If you are only using the primary key on a fact table to count the rows of a table by using the DISTINCTCOUNT function, you could remove the primary key column and use the COUNTROWS DAX function to provide you with the same output.
  • For fact tables, consider pushing columns to dimensions. For e.g if you have a sales fact table with a unit price column however the same unit price value is the same for each product, you could move the unit price value to the product dimension table and use a combination of DAX iterators functions (SUMX, AVERAGEX) and the RELATED function to get nearly like for like parity.

2) Remove rows you do not need

  • For dimension tables, remove rows if you're confident they won't appear in fact, and the business does not need to know when an event from the fact table did not occur
  • For fact tables, remove rows that are not required. For e.g if you only next to compare last week to the same week last year you only need some of the rows that capture events in between the 2 dates.

2) Pick the right granularity

  • If your fact table in your data model is capturing data at a daily level, but all your visuals in your report are grouped at a weekly level, summarise your fact table to match the reporting requirements.

4) Manipulate/Tranform your columns

  • Reduce the decimal point precision for numbers. Going from 2 decimal places to 0 decimal places increases the chances of reducing cardinality.
  • Split Date Time data type column to separate dates, hours, minutes, and seconds columns. This increases the number of columns, but more importantly, increases the chances of reducing cardinality.

5) Use direct query

  • When using direct query mode for a given table the underlying data is not stored in the Power BI data model. Therefore direct query is the easiest way to reduce your data model size, but it does comes with its own challenges. Direct query is a good option if the underlying data source supports direct query and is designed for analytics and concurrency. Perfomance can be an a problem when using only direct query, my recommendation would be to use features that combine the best of both worlds when it comes to import and direct query example of such features are user-defined aggregations, composite models etc.

Have you come across any other methods of reducing cardinality?

Kris Coupe

Providing advanced end-to-end analytical solutions. Specialist in Power BI, DAX, data modelling and visualisation.

2 年

Nice articles Samuel Woldu-Desta and cheers for the mention. I used to love that exercise showing off the importance of cardinality on model size using extremes. Feels like an age ago!!

要查看或添加评论,请登录

社区洞察

其他会员也浏览了