ASHviz: Issue at the x-axis

ASHviz: Issue at the x-axis

Take another look at the plot in header above.

This plot aggregates ASH data by STATE_CLASS using SAMPE_TIME as the x-axis aesthetic. Take a second to think about what data points are plotted: for every distinct SAMPLE_TIME aggregate (count) all the rows (sessions) in each STATE_CLASS. How many distinct x-axis values will have one or more points plotted on the y-axis? Recall the ASH samplers on the 4 RAC instances start and operate independently, and - as we will see in later articles - they time their activities quite closely. There should thus be no overlapping in SAMPLE_TIME across instances, so each data point will be a within-instance aggregate. So just in the 40 minute period between 11:40 and 12:20 that makes 9600 distinct x-axis times:

  • 60 (seconds/samples) * 40 (minutes) * 4 (instances) = 9600

The plot above suffers significant overplotting in both x and y dimensions, among other issues. It is certainly not devoid of meaning, however, and some color and point size manipulations might help better show that:

  • Instances consistently have 5 or fewer active sessions on CPU
  • User I/O ranges between 10 and 20 active sessions consistently
  • Cluster waits are normally around 8 active sessions but experienced 2 episodes when they spiked on at least one instance

This is consistent with what we know about the workload from earlier investigations, if a bit less crisply evident.

OK, now I have to confess that I began these investigations using SAMPLE_ID for the x-axis rather than SAMPLE_TIME. You see, I've mostly used single instance Oracle databases, and even in RAC usually referenced the local V$ASH. So I'm used to thinking of SAMPLE_ID as a proxy for SAMPLE_TIME and with some nice properties because it is a monotonically and sequentially increasing integer (instead of an unwieldy datetime.) So for quick and dirty analyses it works quite nicely.

Look at the plot below, which simply swaps SAMPLE_ID for SAMPLE_TIME for the x-axis from the earlier plot.

No alt text provided for this image

It's kind of a featureless blob, truly a mess. What is going on? There are still the three bands of activity for CPU, Cluster wait, and User I/O wait. But notice that on the y-axis the range of these bands is about 4x the corresponding range on the earlier plot, and there are exactly 4 instances. So SAMPLE_ID must be common across the 4 independent streams of samples and the values for 4 instances are aggregated together for each. Completely absent from this plot is any sense of one or more incidents or unusual activity, so the picture is almost useless.

In the first plot instance activity was separated on the x-axis because sample times were distinct, in the second activity was aggregated across instances because SAMPLE_ID is common. So a confusing aggregation was introduced by carelessly choosing x-axis and not being fully cognizant of the shared SAMPLE_ID.

In this plot we separate by instance using color but keep using SAMPLE_ID as the x-axis:

No alt text provided for this image

Now some of the familiar structure emerges. Notice there seem to be 4 activity peaks, and these seem to show instances 1 and 2 peaking together twice and instances 3 and 4 peaking together twice. Curious. Since our prior investigation showed the peaks happening together on all instances at the same TIME, does this mean that the same SAMPLE_ID occurs close in time for instances 1 and 2, and likewise for instances 3 and 4? Just what is going on between SAMPLE_ID and SAMPLE_TIME anyway?

To answer this I created a simple yet informative visualization: a line chart plotting SAMPLE_TIME on x-axis and SAMPLE_ID on y-axis, coloring by instance.

No alt text provided for this image

Indeed, instances 1 and 2 form a pair and likewise instances 3 and 4, and the nature of their pairing is just as we surmised: generating the same SAMPLE_ID very closely in time. However, the two groups generate the same SAMPLE_ID some 5 minutes apart from each other. This largely answers why the earlier plot was such a mess: it was aggregating together activity on x-axis that did not happen concurrently. Time-shifted aggregation, meaningless.

So what are the take aways from this excursion? Well, certainly to not use SAMPLE_ID as a proxy for time and especially in RAC contexts where SAMPLE_ID from different instances will be offset in time, perhaps significantly. Some x-axis aggregation should always be done (especially in RAC contexts) to provide a unified time line. Aggregating 60 samples into minute-by-minute views of ASH is a favorite strategy, and the notebook ASH data frame provides the MINIT column for just this purpose.

When using ggplot the first two decisions to make relative to a visualization are:

  1. what data frame to plot?
  2. what variable is the x-axis?

Another lesson here: don't rush into plotting a visualization without being fully mindful of the x-axis.

notebook:

jberesni/ASHviz/Jupyter/stdViz

要查看或添加评论,请登录

John Beresniewicz的更多文章

  • Estimating OLTP Execution Latencies Using ASH

    Estimating OLTP Execution Latencies Using ASH

    I want to share something super-useful about Active Session History that I came to understand only last week. Examining…

    17 条评论
  • ASHviz: Dark matter 2

    ASHviz: Dark matter 2

    This article extends the discussion of "dark matter" in ASH by exploring a completely new source of data about event…

  • ASHviz: Fiddling with violins

    ASHviz: Fiddling with violins

    The last ASHviz installment, Densities and dark matter, was a bit of a cognitive burden, but the concepts introduced…

  • ASHviz: Densities and dark matter

    ASHviz: Densities and dark matter

    This installment gets into some deeper concepts relative to visualizing event latency distributions as well as using…

  • ASHviz: Can you box that, please?

    ASHviz: Can you box that, please?

    This installment explores the distribution of sampled event latencies from the ASH dump using `geom_boxplot( )`. ASH…

  • ASHviz: Accidentally good

    ASHviz: Accidentally good

    This is a short blurb about being sensitive to whether a visualization that works well in a specific case will…

  • ASHviz: Visualizing ASH dumps with Jupyter Notebooks

    ASHviz: Visualizing ASH dumps with Jupyter Notebooks

    This article begins what I hope will be an interesting series focusing on some data visualization research I have been…

  • Visualizing Performance Benchmarks (4) - Validate, analyze, conclude

    Visualizing Performance Benchmarks (4) - Validate, analyze, conclude

    In this final episode, we VALIDATE our suspicions about the file-based configurations bottlenecking on read I/O…

    12 条评论
  • Simple SQL Injection Vulnerability Testing

    Simple SQL Injection Vulnerability Testing

    According to The Open Web Application Security Project (OWASP), injection remains the number one category of security…

    3 条评论
  • Visualizing Performance Benchmarks (3) - Start Small and Predict

    Visualizing Performance Benchmarks (3) - Start Small and Predict

    So far in this series we've seen some nice visualizations of elapsed time data for loading a large number of 5GB files…

社区洞察

其他会员也浏览了