Power BI Premium Max Parallelism through Phil's Report
Power BI Refresh optimization can be tricky to visualize, unless we have access to Phil Seamark's "Visualize your power bi refresh" report.
I was investigating overload of a P3 capacity, after narrowing down to one dataset, we captured a trace as mentioned in Phil's blog. I added some additional events such as "Command Begin" and "Command End" along with Notification Events and others.
Phil's blog shows how to save trace as XML file, i traditionally use SQL Server so i used "Save As" --> "Trace Table".
My first challenge was i forgot to capture "Job Graph Events" and realized that i could still use Object Name by Event Subclass chart which shows Execute SQL vs Process timeline. I modified the "Progress Report End" Power Query as follows.
let
? ? Source = Sql.Database("winx", "TestDB", [Query="select ActivityID,EventSubclass,Duration,RequestID,StartTime,TextData,UserObjectID,SessionID,ObjectName,CPUTime,IntegerData,EndTime
from dbo.BaseLine
where EventClass=6 and EventSubclass in (1,17,25,59)"]),
? ? #"Changed Type1" = Table.TransformColumnTypes(Source,{{"StartTime", type datetimezone}, {"EndTime", type datetimezone}, {"Duration", Int64.Type}, {"CPUTime", Int64.Type}}),
? ? #"Renamed Columns" = Table.RenameColumns(#"Changed Type1",{{"Duration", "Duration (ms)"}})
in
? ? #"Renamed Columns"
Since, i didn't had Job Graph Events i removed that part of the logic. Then, we need a custom visual("as timeline", a visual by pragmaticworks) . After making some tweaks, i got the report working, i added some filters to remove object where the durations were less than a second.
How do we interpret the visual highlighted, notice that there are 5 blocks of process grouped at the very top, followed by a stepped layout which shows another blocks starting. If we correlate this event with "Command Begin" shown below, notice the Max Parallelism clause. A maximum of 6 object would be processing parallelly, in above chart we can 5 (one is removed as the duration is less than a second, but i think you got it)
Chris Webb blogged about "Max Parallelism" and have explained in depth.
On a P3 this number can be up to 40 as per the following article, or at least that is how i interpret it. Look at "Model refresh parallelism" in the table with different capacity SKUs.
So, we performed another test using Max Parallelism of 30, here is how the "Command Begin" Event looks like.
领英推荐
Luckily, this time we capture "Job Graph Event", but when i try to save the events as "Trace XML File", and then use that in the report i get following error.
As part of the transformation logic there is a parsed XML step which is failing with error "There are multiple root elements". When looking at the XML output the error kind of make sense, there are two root elements with DirectedGraph. I dumped the data into SQL and used the following query to investigate.
select? RequestID,SessionID, convert(xml,STRING_AGG(cast(TextData as nvarchar(MAX)),''))
from dbo.KLAIncremental?
where EventClass=134 and EventSubclass=2
group by RequestID,SessionID
Since the dataset in question had incremental refresh enabled, an additional command begin event were generated to Analyze Refresh Policy Impact, which includes a Job Graph. I then had to rewrite my SQL as follows to get only the last DirectedGraph. (My T-SQL may not be optimal and in-efficient, this is with my ancient T-SQL knowledge).
select RequestID,SessionID, STRING_AGG(cast(TextData as nvarchar(MAX)),'') TextData
from (
select RequestID,SessionID,?
TextData?
from dbo.Incremental?
where EventClass=134 and EventSubclass=2 and RowNumber >=?
(select max(RowNumber) MaxRowNumber
from ( select RequestID,SessionID,?
TextData, CHARINDEX('<DirectedGraph',TextData) DirectGraph,RowNumber
from dbo.Incremental?
where EventClass=134 and EventSubclass=2
order by RowNumber?
offset 1 row
) T
where DirectGraph=1)
order by RowNumber
offset 0 row
)q?
group by RequestID,SessionID
I then modified my jobgraph power query as follows.
let
? ? Source = Sql.Database("winx", "TestDB", [Query="select RequestID,SessionID, STRING_AGG(cast(TextData as nvarchar(MAX)),'') TextData#(lf)from (#(lf)select RequestID,SessionID, #(lf)TextData #(lf)from dbo.Incremental #(lf)where EventClass=134 and EventSubclass=2 and RowNumber >= #(lf)(select max(RowNumber) MaxRowNumber#(lf)from ( select RequestID,SessionID, #(lf)#(tab)#(tab)TextData, CHARINDEX('<DirectedGraph',TextData) DirectGraph,RowNumber#(lf)#(tab)#(tab)from dbo.Incremental #(lf)#(tab)#(tab)where EventClass=134 and EventSubclass=2#(lf)#(tab)#(tab)order by RowNumber #(lf)#(tab)#(tab)offset 1 row#(lf)#(tab)) T#(lf)where DirectGraph=1)#(lf)order by RowNumber#(lf)offset 0 row#(lf))q #(lf)group by RequestID,SessionID"]),
? ? #"Parsed XML" = Table.TransformColumns(Source,{{"TextData", Xml.Tables}}),
? ? #"Expanded TextData" = Table.ExpandTableColumn(#"Parsed XML", "TextData", {"Nodes"}, {"Nodes"}),
? ? #"Expanded Nodes" = Table.ExpandTableColumn(#"Expanded TextData", "Nodes", {"Node"}, {"Node"}),
? ? #"Expanded Node" = Table.ExpandTableColumn(#"Expanded Nodes", "Node", {"Attribute:Label", "Attribute:Slot"}, {"Attribute:Label", "Attribute:Slot"}),
? ? #"Filtered Rows" = Table.SelectRows(#"Expanded Node", each Text.Contains([#"Attribute:Label"], "Process Partition")),
? ? #"Split Column by Delimiter" = Table.SplitColumn(#"Filtered Rows", "Attribute:Label", Splitter.SplitTextByEachDelimiter({"["}, QuoteStyle.Csv, true), {"Attribute:Label.1", "Attribute:Label.2"}),
? ? #"Changed Type" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Attribute:Label.1", type text}, {"Attribute:Label.2", type text}}),
? ? #"Split Column by Delimiter1" = Table.SplitColumn(#"Changed Type", "Attribute:Label.2", Splitter.SplitTextByEachDelimiter({"]"}, QuoteStyle.Csv, true), {"Attribute:Label.2.1", "Attribute:Label.2.2"}),
? ? #"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter1",{{"Attribute:Label.2.1", type text}, {"Attribute:Label.2.2", type text}}),
? ? #"Removed Columns" = Table.RemoveColumns(#"Changed Type1",{"Attribute:Label.1", "Attribute:Label.2.2"}),
? ? #"Renamed Columns" = Table.RenameColumns(#"Removed Columns",{{"Attribute:Label.2.1", "ObjectName"}}),
? ? #"Added Prefix" = Table.TransformColumns(#"Renamed Columns", {{"Attribute:Slot", each "00" & _, type text}}),
? ? #"Cleaned Text" = Table.TransformColumns(#"Added Prefix",{{"Attribute:Slot", Text.Clean, type text}}),
? ? #"Extracted Last Characters" = Table.TransformColumns(#"Cleaned Text", {{"Attribute:Slot", each Text.End(_, 2), type text}})
in
? ? #"Extracted Last Characters"
Then i modified "Progress report End" power query as follows.
let
? ? Source = Sql.Database("winx", "TestDB", [Query="select ActivityID,EventSubClass,Duration,RequestID,StartTime,TextData,UserObjectID,SessionID,ObjectName,CPUTime,IntegerData,EndTime#(lf)from dbo.Incremental#(lf)where EventClass=6 and EventSubclass in (1,17,25,59)#(lf)"]),
? ? #"Changed Type1" = Table.TransformColumnTypes(Source,{{"StartTime", type datetimezone}, {"EndTime", type datetimezone}, {"Duration", Int64.Type}, {"CPUTime", Int64.Type}}),
? ? #"Renamed Columns" = Table.RenameColumns(#"Changed Type1",{{"Duration", "Duration (ms)"}}),
? ? #"Merged Queries" = Table.NestedJoin(#"Renamed Columns", {"SessionID", "ObjectName","RequestID"}, JobGraph, {"SessionID", "ObjectName","RequestID"}, "JobGraph", JoinKind.LeftOuter),
? ? #"Expanded JobGraph" = Table.ExpandTableColumn(#"Merged Queries", "JobGraph", {"Attribute:Slot"}, {"Attribute:Slot"})
in
? ? #"Expanded JobGraph"
Now, i get report where i can see slots and Event Sub class (Execute SQL and Process) timeline view.
Bottom line is you could use Phil's report to visualize max parallelism and take advantage of premium capacity to increase it. Phil even summarizes what we can do in case we see SQL or Process taking time, ensure to read both the references they are gold.
Founder | IT Solutions | Microsoft & Cloud Expert | 20+ Years of Driving Business Transformation | Helping Businesses Work Smarter with Technology
1 年that is really good one ?? Raju Kumar