?? DAX: Optimize the Combination of CALCULATE and FILTER ??

?? DAX: Optimize the Combination of CALCULATE and FILTER ??

?? In the world of data analysis, speed and efficiency are paramount. As datasets grow larger and more complex, the ability to quickly and accurately retrieve insights becomes increasingly critical.

?? The combination of CALCULATE and FILTER is commonly used in DAX (Data Analysis Expressions).

?? Let’s explore how these functions work and how they can be used to optimize query speed.


?? 1. Understanding CALCULATE and FILTER

  • CALCULATE modifies the filter context and then performs the aggregation. Think of it as a way to change the environment in which your data is evaluated. This is particularly useful when you want to perform calculations based on specific conditions.
  • FILTER returns a table containing only the rows that meet a specified condition. It is used to apply specific conditions to data.

Sample

CALCULATE(
    SUM(Sales[Amount]),
    FILTER(
               Sales, 
               Sales[Category] = "Food" 
               && Sales[Region] = "Southeast Asia"
               )
)

In this example:

+ FILTER narrows down the Sales table to only include rows where the Category is “Food” and the Region is “Southeast Asia”.
+ CALCULATE then sums the Amount column within this filtered context.        

?? 2. Problem Statement

  • When working with datasets that contain millions of rows, the FILTER function can become a bottleneck.
  • This is because FILTER can iterate over all rows in the table, which can be time-consuming. To mitigate this, we can employ several tactics to optimize the performance of CALCULATE and FILTER.

_val_fb =
CALCULATE(
    DIVIDE([_val_actual]-[_val_fc],[_val_fc]), 
    FILTER(
        fc_output, 
        fc_output[week_offset]<0
        )
    )

=> It takes 3.76 minutes to process with data of 2,000,000 rows        

?? 3. Tactics

? 3.1. Filter Dimension Tables

Instead of filtering a large fact table directly, filter a related dimension table that has a one-to-many or one-to-one relationship with the fact table.

Example: Instead of filtering the fc_output table with 2,000,000 rows, filter the dim_calendar table with 30,000 rows. This significantly reduces the number of iterations from 2,000,000 to 30,000.

_val_fb = 
CALCULATE(
    DIVIDE([_val_actual]-[_val_fc],[_val_fc]), 
    FILTER(
        calendar_master, 
        calendar_master[week_offset]<0
        )
    )

=> It takes 0.03 minutes to process        

? 3.2. Filter and ALL/ALLSELECTED with column

Apply filters directly to the columns you want to filter, rather than the entire table.

Example: Using ALLSELECTED(fc_output[Date]) filters only the Date column, which can be more efficient because FILTER will only iterate the distinct values of the Date column in fc_output table.

_val_fb =
CALCULATE(
    DIVIDE([_val_actual]-[_val_fc],[_val_fc]), 
    FILTER(
        ALLSELECTED(fc_output[week_offset]), 
        fc_output[week_offset]<0
        )
    )

=> It takes 0.02 minutes to process with data of 2,000,000 rows        

?? Optimizing query speed in DAX is crucial for handling large datasets efficiently.

?? If you’re dealing with performance issues in your DAX queries, applying these tactics can make a significant difference - reduce processing time and enhance query performance.

?? Feel free to share with me and community your insights and experiences.


?? Le Ngoc Phuong Trinh./

CALCULATE and FILTER are some of the most important DAX functions in Power BI.

回复
Olivia Sterner

Sustainability Coordinator at Starbucks

3 周

That sounds dope. Power BI tactics can really amp up performance. What’s the most surprising result you’ve gotten using CALCULATE and FILTER?

回复
Manh Duc

Supply Chain at Unilever Vietnam

3 周

Excellent ch? Trinh, will apply this tip for my next project

要查看或添加评论,请登录

社区洞察

其他会员也浏览了