How to make Power BI Direct-Query Fast & Optimized
Sujeet Singh
Delivering Data Analytics, Gen AI, Business Intelligence and Cloud Solutions | Microsoft Certified Azure Data Architect and Expert - Data Management
Direct query reports in Power BI tend to be slower compared to import mode. Performing complex data aggregations or extensive slicing and dicing can further slowdown these reports, potentially frustrating end users. Microsoft has enabled an advanced setting for direct-query reports or composite data models - "Automatic Aggregations".
Automatic aggregations leverage advanced machine learning (ML) to enhance the performance of Direct-Query semantic models by continuously optimizing them. Built on the foundation of the user-defined aggregations framework introduced with composite models in Power BI, automatic aggregations simplify the process by removing the need for extensive data modeling and query optimization expertise. These aggregations are self-training and self-optimizing, allowing model owners—regardless of their skill level—to improve query performance and deliver faster report visualizations for large data models.
Benefits of Automatic Aggregations:
Supported plans
Automatic aggregations are supported for Power BI Premium per capacity, Premium per user, and Power BI Embedded models.
Permissions
To enable and configure automatic aggregations, you must be the Model owner. Workspace admins can take over as owner to configure automatic aggregations settings.
With DirectQuery, whenever a user opens or interacts with a report, DAX queries are sent to the query engine, which then forwards them as SQL queries to the backend data source. The data source must then process these queries and return the results. Compared to import mode models that store data in memory, this round-trip process in DirectQuery can be both time-consuming and resource-intensive, often leading to slower report performance.
Enabling automatic aggregations in a DirectQuery model can significantly improve performance by reducing the need for constant data source queries. Instead of relying solely on the backend, automatic aggregations use an in-memory cache that stores pre-aggregated results. This cache holds only a small portion of the data from fact and detail tables but still manages to deliver faster query responses and reduce the load on the data source. For queries that aren’t covered by the cache, the system automatically falls back to querying the backend, similar to traditional DirectQuery behavior.
Training operations
During the first scheduled refresh of the day or week (based on your chosen frequency), Power BI runs a training operation to analyze query patterns and adjust the in-memory aggregation cache accordingly. This process ensures the cache is optimized as query patterns change.
In this step, Power BI may create, update, or remove aggregation tables, and it also queries the data source to decide what data should be included in the cache. However, the actual aggregation data isn’t loaded during training — it gets loaded in the next refresh.
For example, if your refresh schedule is set for 4:00 AM, 9:00 AM, 2:00 PM, and 7:00 PM, only the 4:00 AM refresh will include both training and data refresh. The remaining refreshes for the day will only update the cache without retraining.
During training, Power BI analyzes past queries from the query log to predict and prepare for future queries. While this method is generally accurate, it doesn’t guarantee that all future queries will be covered by the in-memory cache, especially if they differ from past patterns. Queries not covered by the cache are sent directly to the data source using DirectQuery. However, if these new queries are frequent or important, they may be added to the cache during the next training operation.
The training process has a 60-minute time limit. If it can’t finish analyzing the entire query log within this time, a notification appears in the model’s Refresh history, and training will resume in the next scheduled session. Once the full query log is processed, the new aggregation data replaces the previous
Caution
Training and refresh operations can be demanding on both the Power BI service and data source systems. Increasing the percentage of queries that use aggregations requires more data to be queried and calculated during these processes. This can lead to higher resource consumption and may increase the risk of timeouts.
There is more detailed information provided by Microsoft and step by step guide through a whitepaper. Please do refer the Power BI technical whitepapers and documents for deep dive details.
#Microsoft #PowerBI #MicrosoftLearn #GenerativeAI #Snowflake #AIApplications #LLMs (Large Language Models) #GenerativeAI #MachineLearning #ArtificialIntelligence #DataAnalytics #TechInnovation #DataScience #TechTrends #CloudComputing #EnterpriseAI #OpenAI #Microsoft #Copilot #GoogleAI #Google #NVIDA