登录查看更多内容

Optimising The Performance Of Power Query Merges In Power BI Table.Join And Other Join Algorithms

Dmitry Imanakov

Data Analyst | Data Scientist

发布日期: 2024年1月18日

+ 关注

As a reminder, the seven join algorithms that can be used with Table.Join are:

JoinAlgorithm.Dynamic
JoinAlgorithm.LeftHash
JoinAlgorithm.LeftIndex
JoinAlgorithm.PairwiseHash
JoinAlgorithm.RightHash
JoinAlgorithm.RightIndex
JoinAlgorithm.SortMerge

The first thing to say is that if you don’t specify a join algorithm in the sixth parameter of Table.Join (it’s an optional parameter), Power Query will try to decide which algorithm to use based on some undocumented heuristics. The same thing also happens if you use JoinAlgorithm.Dynamic in the sixth parameter of Table.Join, or if you use the Table.NestedJoin function instead, which doesn’t allow you to explicitly specify an algorithm.

There are going to be some cases where you can get better performance by explicitly specifying a join algorithm instead of relying on JoinAlgorithm.Dynamic but you’ll have to do some thorough testing to prove it. From what I’ve seen there are lots of cases where explicitly setting the algorithm will result in worse performance, although there are enough cases where doing so results in better performance to make all that testing worthwhile.

For example, using the same CSV file that I’ve been using in my previous posts, I created two source queries called First and Second that only returned column A and the first 300,000 rows. Here’s the M code for First (the code for Second only differs in that it renames the only column to A2):

let
Source?=?Csv.Document(
File.Contents("C:\Users\chwebb\Documents\NumbersMoreColumns.csv"),
[Delimiter?=?",",?Columns?=?7,?Encoding?=?65001,?QuoteStyle?=?QuoteStyle.None]
),
#"Promoted Headers"?=?Table.PromoteHeaders(Source,?[PromoteAllScalars?=?true]),
#"Removed Other Columns"?=?Table.SelectColumns(#"Promoted Headers",?{"A"}),
#"Renamed Columns"?=?Table.RenameColumns(#"Removed Other Columns",?{{"A",?"A1"}}),
#"Kept First Rows"?=?Table.FirstN(#"Renamed Columns",?300000)
in
#"Kept First Rows"

Here’s a query that uses Table.Join and JoinAlgorithm.Dynamic to merge these two queries:

领英推荐

Rebinding Power BI Reports to Different Dataset

?těpán Re?l 2 年前

Scrape Amazon data for Visualization #PowerBI #Scrape

Dhaval Upadhyay 4 年前

Minimizing source queries in #PowerBI

Ross Couldrey, MMA 2 年前

let
Source = Table.Join(First, {"A1"}, Second, {"A2"}, JoinKind.Inner, JoinAlgorithm.Dynamic)
in
Source

The average timings for this query on my PC were:

Progress Report End/25 Execute SQL – 2.0 seconds
Progress Report End/17 Read Data – 0.4 seconds

Changing this query to use JoinAlgorithm.LeftHash instead, like so:

let
Source = Table.Join(First, {"A1"}, Second, {"A2"}, JoinKind.Inner, JoinAlgorithm.LeftHash)
in
Source

…resulted in the following average timings:

Progress Report End/25 Execute SQL –? 0.9 seconds
Progress Report End/17 Read Data – 0.6 seconds

An improvement of almost one second – but I’ve not included here all the other test results for algorithms that performed worse (I had to cancel the query that used JoinAlgorithm.LeftIndex because it was so slow). And just to be clear: I’m not saying that using JoinAlgorithm.LeftHash is always better than JoinAlgorithm.Dynamic, just that it happened to perform better in this case with these queries and this data. With different data and different queries then different algorithms may perform better.

要查看或添加评论，请登录

Dmitry Imanakov的更多文章

Power Query Nested Data Types In Excel

2024年1月23日

Power Query Nested Data Types In Excel

year ago support for nested data types in Excel was announced on the Excel blog, but the announcement didn’t have much…
Using tuple syntax in DAX expressions

2024年1月22日

Using tuple syntax in DAX expressions

This article describes the use of the tuple syntax in DAX expressions to simplify comparisons involving two or more…
Window functions in DAX

2024年1月19日

Window functions in DAX

INDEX, OFFSET, and WINDOW are new table functions that aim to navigate over a sorted and partitioned table to obtain…

Optimising The Performance Of Power Query Merges In Power BI Table.Join And Other Join Algorithms

Dmitry Imanakov

Data Analyst | Data Scientist

领英推荐

Dmitry Imanakov的更多文章

社区洞察

其他会员也浏览了

Ctrl-C Ctrl-V Power BI models

Minimizing source queries in #PowerBI

SpeedTip - EnableFolding option for Dataverse Native SQL Power Queries

Power BI Refresh Performance - My pick of Blog posts

Converting base64 strings to text via Power Query

Using Power Platform Dataflow with multi-select choices in Dataverse

Power BI Premium Max Parallelism through Phil's Report

Stuck in "Evaluating..." a Dataverse source in Power BI? - Try this SpeedTip

Extracting the Latest Number of Records per Group in Power BI

领英推荐

Dmitry Imanakov的更多文章

Power Query Nested Data Types In Excel

Using tuple syntax in DAX expressions

Window functions in DAX

社区洞察

其他会员也浏览了

Ctrl-C Ctrl-V Power BI models

Minimizing source queries in #PowerBI

SpeedTip - EnableFolding option for Dataverse Native SQL Power Queries

Power BI Refresh Performance - My pick of Blog posts

Converting base64 strings to text via Power Query

Using Power Platform Dataflow with multi-select choices in Dataverse

Power BI Premium Max Parallelism through Phil's Report

Stuck in "Evaluating..." a Dataverse source in Power BI? - Try this SpeedTip

Extracting the Latest Number of Records per Group in Power BI