ASHviz: Fiddling with violins

ASHviz: Fiddling with violins

The last ASHviz installment, Densities and dark matter, was a bit of a cognitive burden, but the concepts introduced are fundamental to many of the ASHviz investigations. Here we continue the thread with a new visualization of the sampled and estimated latency density functions.

Violin plots: geom_violin

The ggplot package includes a "violin" geom that produces a variation on the probability density plot that is both pleasant to the eye and facilitates certain visual comparisons. A so-called "violin plot" is simply the probability density curve reflected around the x-axis and then tilted up 90-degrees (coordinate flip) making it (sometimes) look like a violin or cello due to the smoothly curved outlines.

Here is code to create a violin plot of sampled latencies from the Events data frame:

p <- ggplot(data=Events, aes(y=log10(TIME_WAITED)))


pgeom_violin(aes(x="SAMPLE"))

With the resulting plot:

No alt text provided for this image

Recall the standard density plot and observe how the violin shape is simply a variation on the density using the reflection and coordinate rotation.

No alt text provided for this image

The violin plot is also solid-filled with white by default, so it has a much more tangible visual impact. The standard line plot is much better for reading off latencies for features like values at the peaks, mostly due to the horizontally oriented latency axis. We will soon see the advantages of violin plots for comparisons.

Estimate-weighted violin plots

Just as with the standard density, we can plot violin densities over the estimated count of events to get an unbiased view of the distribution of event latencies from those sampled by the ASH dump.

Plotting weighted densities using geom_violin frequently resulting in the following warning message:

Warning message in density.default(x, weights = w, bw = bw, adjust = adjust, kernel = kernel, :
“sum(weights) != 1  -- will not get true density”

Just as with the standard density plot, the solution is to set weight = EST_COUNT / sum(EST_COUNT) as follows:

p +  geom_violin(aes(x ="EST_COUNT", weight=EST_COUNT/sum(EST_COUNT)))

This produces the following plot without warning messages:

No alt text provided for this image

Note that incorrect weighting (with error messages) does produce correct plot, so the function seems to do simple scaling automatically. These warning errors might be ignored when doing rapid prototyping of visualizations, however before drawing conclusions or making important observations about a visualization the program should be made to execute warning-free to insure plot accuracy.

Note again how much of the weighted density plot lies below 100 microseconds. It looks like about half of the area, which means event latencies are just as likely to be below as above 100 microseconds.

Recall the visual difficulties and distinctions that were made plotting the estimated and sampled density charts together. Crossing lines were distracting, and we ended up doing one plot as filled area and one as a line. Comparison of the two densities was a bit of an issue.

Side-by-side with medians

To assist in comparing the two probability densities, we can plot the two violins side-by-side. This really improves the ability to visually compare and contrast features of the densities at different latencies.

No alt text provided for this image

Horizontal lines have also been added at the density medians using the draw_quantiles parameter of geom_violin. Now we see that the actual median of the estimate-weighted density is closer to 1 millisecond than the earlier 100 microsecond guesstimate. We can also roughly quantify the sampler bias in the sense that the median values of the two densities differ by close to 2 orders of magnitude, in other words quite a bit.

Comparing instances

A natural question when looking at system-level data in a RAC environment is "how do instances compare to each other?" Violin plots of the density of sampled latencies split up by instance look like this:

No alt text provided for this image

To my eye these look extremely similar, meaning that the statistical properties of the sampled event latencies are almost identical across instances. This would seem to indicate a strong similarity in workload processing with no instance-level aberrations like CPU saturation. The smoothed and symmetric shape of the violins makes comparison for like-ness quite direct, they all line up at all the peaks and valleys and if one didn't it would surely be noticeable.

Count-estimate weighted densities

Plots of the instance latency densities weighted by estimated counts yield similar results:

No alt text provided for this image

Here again we see very high level of visual agreement in the number, location, and prominence of the violin features. Comparing four densities against each other for similarity feels not much different than comparing two, so there may be good scalability properties for that use case. The fact that the agreement is strong even after the weighting transformation seems to indicate strong consistency across instances, especially in the lower latencies. At the very lowest latencies some subtle differences can be observed that were of course invisible in the unweighted plot.

Note that we had to compute instance subtotals of EST_COUNT in order to weight the values properly and plot without the warning messages.

Conclusions

This investigation explored the use of violin plots to visualize probability density functions of ASH dump event latencies both as sampled and weighted using the count estimation technique. Violin plots facilitate visual comparisons for similarity by transforming 1D lines into 2D shapes that are more amenable to direct visual cognition.

notebook:

github/jberesni/ASHviz/Jupyter/eventEst.ipynb

要查看或添加评论,请登录

John Beresniewicz的更多文章

  • Estimating OLTP Execution Latencies Using ASH

    Estimating OLTP Execution Latencies Using ASH

    I want to share something super-useful about Active Session History that I came to understand only last week. Examining…

    17 条评论
  • ASHviz: Dark matter 2

    ASHviz: Dark matter 2

    This article extends the discussion of "dark matter" in ASH by exploring a completely new source of data about event…

  • ASHviz: Densities and dark matter

    ASHviz: Densities and dark matter

    This installment gets into some deeper concepts relative to visualizing event latency distributions as well as using…

  • ASHviz: Can you box that, please?

    ASHviz: Can you box that, please?

    This installment explores the distribution of sampled event latencies from the ASH dump using `geom_boxplot( )`. ASH…

  • ASHviz: Issue at the x-axis

    ASHviz: Issue at the x-axis

    Take another look at the plot in header above. This plot aggregates ASH data by STATE_CLASS using SAMPE_TIME as the…

  • ASHviz: Accidentally good

    ASHviz: Accidentally good

    This is a short blurb about being sensitive to whether a visualization that works well in a specific case will…

  • ASHviz: Visualizing ASH dumps with Jupyter Notebooks

    ASHviz: Visualizing ASH dumps with Jupyter Notebooks

    This article begins what I hope will be an interesting series focusing on some data visualization research I have been…

  • Visualizing Performance Benchmarks (4) - Validate, analyze, conclude

    Visualizing Performance Benchmarks (4) - Validate, analyze, conclude

    In this final episode, we VALIDATE our suspicions about the file-based configurations bottlenecking on read I/O…

    12 条评论
  • Simple SQL Injection Vulnerability Testing

    Simple SQL Injection Vulnerability Testing

    According to The Open Web Application Security Project (OWASP), injection remains the number one category of security…

    3 条评论
  • Visualizing Performance Benchmarks (3) - Start Small and Predict

    Visualizing Performance Benchmarks (3) - Start Small and Predict

    So far in this series we've seen some nice visualizations of elapsed time data for loading a large number of 5GB files…

社区洞察

其他会员也浏览了