登录查看更多内容

ASHviz: Can you box that, please?

John Beresniewicz

Architect at Oracle

发布日期: 2019年4月27日

+ 关注

This installment explores the distribution of sampled event latencies from the ASH dump using `geom_boxplot( )`.

ASH fix-up and TIME_WAITED

When ASH samples a session that is waiting, the value of TIME_WAITED is set at 0. When the wait event completes, the exact time spent waiting is recorded and the ASH mechanism updates TIME_WAITED in the last of its samples for this event with this value. We refer to this "update after sampling" feature as the "ASH fix-up." A number of important values may be fixed-up after sampling by this mechanism, but TIME_WAITED is of especial interest and figures in many of the ASHviz investigations.

Events data frame

Given that only fixed-up samples in ASH will have a nonzero value for TIME_WAITED, we can create a data frame of all events sampled thus:

Events <- ashDF %>% filter( TIME_WAITED > 0 )

Box-whisker plots

The box-whisker plot produced by `geom_boxplot( )` in the ggpplot2 package is a high-level visual sketch of the distribution of a given set of data values. The "box" is drawn with upper-lower bounds at the 75th and 25th percentiles, meaning that half the given data values are within this box. The "whiskers" extend 1.5 * the inter-quartile range (range between 25th and 75th percentiles) above and below the box bounds, and "outlier" points beyond these limits are explicitly drawn. Finally, within the box the median value is indicated by a thick horizontal line. So this visual mechanism gives rapid coarse-grained insight into the compactness or spread of the data, range of values, existence of outliers, and skewness of distribution.

We can draw a box-whisker plot over the TIME_WAITED values in the Events data frame as follows:

ggplot(data = Events, aes(x = "ALL WAITS" , y = log10(TIME_WAITED))) + 
geom_boxplot()

Which yields the following plot:

We actually plot over log10(TIME_WAITED) to characterize the distribution on a log scale. Under this scheme the value 3 corresponds to 10**3 microseconds or 1 millisecond, and the value 6 corresponds to 1 second.

Even though our code is about as simple as it gets, the plot is informative about the overall collection of events recorded in our ASH dump. The box hows half the event latencies lie between 10 milliseconds and 200 milliseconds or so. The median splits the box pretty evenly so there is not much skew in the bulk of the values, however there appear to be a strong cadre of outliers on the low-latency side, with more than a few values below 100 microseconds. On the high side notice events above 1 second in duration, important in that individual wait events may have been sampled multiple times.

Wait class differences

Wait events cover a large multitude of serialization and delay mechanisms, supporting many purposes and contexts in an operational Oracle database. Understanding event latencies is an important part of understanding system performance altogether, as well as solving specific issues. So lumping all events together and characterizing the group distribution may not really be that useful in a practical sense.

One natural breakdown of all events is by wait class. We can do this in geom_boxplot by setting the x aesthetic to WAITCLASS. Also, I added red reference lines at 1 millisecond and 1 second:

The result is one boxplot per wait class summarizing all event latencies from that class. There is definitely additional information to be gleaned from this more detailed view:

the bulk of event latencies in all classes are between 1 millisecond and 1 second
"Idle" events were sampled even though this is not supposed to occur
"Network" events have much larger variation than any other class
"Commit" events show tight distribution around 10 milliseconds but with a significant cluster of outliers well above 100 milliseconds

However in truth the view above is not really very satisfying. For one thing, there is no indication of how many events are summarized by each boxplot, and since all plots are given equal visual real estate this may lead to the classic "thinking too hard about too little" issue.

Another problem stems from the fact that there may be many or few event types in any given wait class and their latency characteristics may be very different. So a class-level plot may summarize many event types together, or only one or two, depending. Confusing.

It seems that when looking at event latencies, either looking at them all together or isolating at the event level are the two best options.

Unfortunately, when I changed the x aesthetic to use EVENT_ID instead of WAITCLASS, the unreadable mess in the header of this article was the result. I even went further in the notebook, also coloring the plots by EVENT_ID. Disaster.

Boxes of boxes

We can combine the grouping by wait class with the summarization by event using faceting. The plot below still uses EVENT_ID as the x aesthetic but facets by WAITCLASS, producing a separate collection of individual event boxplots for each class.

Here we see very different numbers of event types have been sampled per class, from a single type in both the "Commit" and "Idle" classes to numerous types in the "Cluster" and "Other" classes. Observe that the broad range of values in the "Network" class is really the union of several smaller ranges. Similarly, the "Application" class combines 3 event types with very different latency characteristics.

This is probably about as much mileage as we will get from this investigation. The boxplot has provided some good high-level insights into sampled event latencies by type and wait class. It's a good start anyway.

references:

ASH Architecture and Advanced Usage last and best presentation on ASH from my days at Oracle

notebook:

github/jberesni/ASHviz/Jupyter/eventEst.ipynb

要查看或添加评论，请登录

John Beresniewicz的更多文章

Estimating OLTP Execution Latencies Using ASH

2021年2月16日

Estimating OLTP Execution Latencies Using ASH

I want to share something super-useful about Active Session History that I came to understand only last week. Examining…

17 条评论
ASHviz: Dark matter 2

2019年5月16日

ASHviz: Dark matter 2

This article extends the discussion of "dark matter" in ASH by exploring a completely new source of data about event…
ASHviz: Fiddling with violins

2019年5月6日

ASHviz: Fiddling with violins

The last ASHviz installment, Densities and dark matter, was a bit of a cognitive burden, but the concepts introduced…
ASHviz: Densities and dark matter

2019年5月1日

ASHviz: Densities and dark matter

This installment gets into some deeper concepts relative to visualizing event latency distributions as well as using…
ASHviz: Issue at the x-axis

2019年4月25日

ASHviz: Issue at the x-axis

Take another look at the plot in header above. This plot aggregates ASH data by STATE_CLASS using SAMPE_TIME as the…
ASHviz: Accidentally good

2019年4月23日

ASHviz: Accidentally good

This is a short blurb about being sensitive to whether a visualization that works well in a specific case will…
ASHviz: Visualizing ASH dumps with Jupyter Notebooks

2019年4月22日

ASHviz: Visualizing ASH dumps with Jupyter Notebooks

This article begins what I hope will be an interesting series focusing on some data visualization research I have been…
Visualizing Performance Benchmarks (4) - Validate, analyze, conclude

2018年11月16日

Visualizing Performance Benchmarks (4) - Validate, analyze, conclude

In this final episode, we VALIDATE our suspicions about the file-based configurations bottlenecking on read I/O…

12 条评论
Simple SQL Injection Vulnerability Testing

2018年11月13日

Simple SQL Injection Vulnerability Testing

According to The Open Web Application Security Project (OWASP), injection remains the number one category of security…

3 条评论
Visualizing Performance Benchmarks (3) - Start Small and Predict

2018年11月12日

Visualizing Performance Benchmarks (3) - Start Small and Predict

So far in this series we've seen some nice visualizations of elapsed time data for loading a large number of 5GB files…

See all articles

ASHviz: Can you box that, please?

John Beresniewicz

Architect at Oracle

ASH fix-up and TIME_WAITED

Events data frame

Box-whisker plots

Wait class differences

Boxes of boxes

references:

notebook:

John Beresniewicz的更多文章

社区洞察

其他会员也浏览了

How I TC: It’s a marathon, not a sprint, to our biggest data event of the year

Data Cleaning: Transforming Raw Data into Gold

"Application of Interquartile Range"

Why You Should Know About Cycle Plots

?? QQ Plots Explained: A Beginner's Guide to Comparing Distributions

"What are Histograms ?"

Related Function in Dax

Context Is King

NYC Fire Data Visualization 2013-2018

March Data Analytics Challenge: Top 5 Finalist

ASH fix-up and TIME_WAITED

Events data frame

Box-whisker plots

Wait class differences

Boxes of boxes

references:

notebook:

John Beresniewicz的更多文章

Estimating OLTP Execution Latencies Using ASH

ASHviz: Dark matter 2

ASHviz: Fiddling with violins

ASHviz: Densities and dark matter

ASHviz: Issue at the x-axis

ASHviz: Accidentally good

ASHviz: Visualizing ASH dumps with Jupyter Notebooks

Visualizing Performance Benchmarks (4) - Validate, analyze, conclude

Simple SQL Injection Vulnerability Testing

Visualizing Performance Benchmarks (3) - Start Small and Predict

社区洞察

其他会员也浏览了

How I TC: It’s a marathon, not a sprint, to our biggest data event of the year

Data Cleaning: Transforming Raw Data into Gold

"Application of Interquartile Range"

Why You Should Know About Cycle Plots

?? QQ Plots Explained: A Beginner's Guide to Comparing Distributions

"What are Histograms ?"

Related Function in Dax

Context Is King

NYC Fire Data Visualization 2013-2018

March Data Analytics Challenge: Top 5 Finalist