登录查看更多内容

ASHviz: Issue at the x-axis

John Beresniewicz

Architect at Oracle

发布日期: 2019年4月25日

Take another look at the plot in header above.

This plot aggregates ASH data by STATE_CLASS using SAMPE_TIME as the x-axis aesthetic. Take a second to think about what data points are plotted: for every distinct SAMPLE_TIME aggregate (count) all the rows (sessions) in each STATE_CLASS. How many distinct x-axis values will have one or more points plotted on the y-axis? Recall the ASH samplers on the 4 RAC instances start and operate independently, and - as we will see in later articles - they time their activities quite closely. There should thus be no overlapping in SAMPLE_TIME across instances, so each data point will be a within-instance aggregate. So just in the 40 minute period between 11:40 and 12:20 that makes 9600 distinct x-axis times:

60 (seconds/samples) * 40 (minutes) * 4 (instances) = 9600

The plot above suffers significant overplotting in both x and y dimensions, among other issues. It is certainly not devoid of meaning, however, and some color and point size manipulations might help better show that:

Instances consistently have 5 or fewer active sessions on CPU
User I/O ranges between 10 and 20 active sessions consistently
Cluster waits are normally around 8 active sessions but experienced 2 episodes when they spiked on at least one instance

This is consistent with what we know about the workload from earlier investigations, if a bit less crisply evident.

OK, now I have to confess that I began these investigations using SAMPLE_ID for the x-axis rather than SAMPLE_TIME. You see, I've mostly used single instance Oracle databases, and even in RAC usually referenced the local V$ASH. So I'm used to thinking of SAMPLE_ID as a proxy for SAMPLE_TIME and with some nice properties because it is a monotonically and sequentially increasing integer (instead of an unwieldy datetime.) So for quick and dirty analyses it works quite nicely.

Look at the plot below, which simply swaps SAMPLE_ID for SAMPLE_TIME for the x-axis from the earlier plot.

It's kind of a featureless blob, truly a mess. What is going on? There are still the three bands of activity for CPU, Cluster wait, and User I/O wait. But notice that on the y-axis the range of these bands is about 4x the corresponding range on the earlier plot, and there are exactly 4 instances. So SAMPLE_ID must be common across the 4 independent streams of samples and the values for 4 instances are aggregated together for each. Completely absent from this plot is any sense of one or more incidents or unusual activity, so the picture is almost useless.

In the first plot instance activity was separated on the x-axis because sample times were distinct, in the second activity was aggregated across instances because SAMPLE_ID is common. So a confusing aggregation was introduced by carelessly choosing x-axis and not being fully cognizant of the shared SAMPLE_ID.

In this plot we separate by instance using color but keep using SAMPLE_ID as the x-axis:

Now some of the familiar structure emerges. Notice there seem to be 4 activity peaks, and these seem to show instances 1 and 2 peaking together twice and instances 3 and 4 peaking together twice. Curious. Since our prior investigation showed the peaks happening together on all instances at the same TIME, does this mean that the same SAMPLE_ID occurs close in time for instances 1 and 2, and likewise for instances 3 and 4? Just what is going on between SAMPLE_ID and SAMPLE_TIME anyway?

To answer this I created a simple yet informative visualization: a line chart plotting SAMPLE_TIME on x-axis and SAMPLE_ID on y-axis, coloring by instance.

Indeed, instances 1 and 2 form a pair and likewise instances 3 and 4, and the nature of their pairing is just as we surmised: generating the same SAMPLE_ID very closely in time. However, the two groups generate the same SAMPLE_ID some 5 minutes apart from each other. This largely answers why the earlier plot was such a mess: it was aggregating together activity on x-axis that did not happen concurrently. Time-shifted aggregation, meaningless.

So what are the take aways from this excursion? Well, certainly to not use SAMPLE_ID as a proxy for time and especially in RAC contexts where SAMPLE_ID from different instances will be offset in time, perhaps significantly. Some x-axis aggregation should always be done (especially in RAC contexts) to provide a unified time line. Aggregating 60 samples into minute-by-minute views of ASH is a favorite strategy, and the notebook ASH data frame provides the MINIT column for just this purpose.

When using ggplot the first two decisions to make relative to a visualization are:

what data frame to plot?
what variable is the x-axis?

Another lesson here: don't rush into plotting a visualization without being fully mindful of the x-axis.

notebook:

jberesni/ASHviz/Jupyter/stdViz

要查看或添加评论，请登录

John Beresniewicz的更多文章

Estimating OLTP Execution Latencies Using ASH

2021年2月16日

Estimating OLTP Execution Latencies Using ASH

I want to share something super-useful about Active Session History that I came to understand only last week. Examining…

17 条评论
ASHviz: Dark matter 2

2019年5月16日

ASHviz: Dark matter 2

This article extends the discussion of "dark matter" in ASH by exploring a completely new source of data about event…
ASHviz: Fiddling with violins

2019年5月6日

ASHviz: Fiddling with violins

The last ASHviz installment, Densities and dark matter, was a bit of a cognitive burden, but the concepts introduced…
ASHviz: Densities and dark matter

2019年5月1日

ASHviz: Densities and dark matter

This installment gets into some deeper concepts relative to visualizing event latency distributions as well as using…
ASHviz: Can you box that, please?

2019年4月27日

ASHviz: Can you box that, please?

This installment explores the distribution of sampled event latencies from the ASH dump using `geom_boxplot( )`. ASH…
ASHviz: Accidentally good

2019年4月23日

ASHviz: Accidentally good

This is a short blurb about being sensitive to whether a visualization that works well in a specific case will…
ASHviz: Visualizing ASH dumps with Jupyter Notebooks

2019年4月22日

ASHviz: Visualizing ASH dumps with Jupyter Notebooks

This article begins what I hope will be an interesting series focusing on some data visualization research I have been…
Visualizing Performance Benchmarks (4) - Validate, analyze, conclude

2018年11月16日

Visualizing Performance Benchmarks (4) - Validate, analyze, conclude

In this final episode, we VALIDATE our suspicions about the file-based configurations bottlenecking on read I/O…

12 条评论
Simple SQL Injection Vulnerability Testing

2018年11月13日

Simple SQL Injection Vulnerability Testing

According to The Open Web Application Security Project (OWASP), injection remains the number one category of security…

3 条评论
Visualizing Performance Benchmarks (3) - Start Small and Predict

2018年11月12日

Visualizing Performance Benchmarks (3) - Start Small and Predict

So far in this series we've seen some nice visualizations of elapsed time data for loading a large number of 5GB files…

See all articles

ASHviz: Issue at the x-axis

John Beresniewicz

Architect at Oracle

notebook:

John Beresniewicz的更多文章

社区洞察

其他会员也浏览了

Race condition

The Tale of Sir Spring Boot and the Quest for the Large Data Dragon

Bring the Data Up Front

Cache hit ratios at the edge: a performance study

what is the Node.js event loop

Tackling Two Major Challenges in Computer Science: Caching and Naming Conventions

?? Memory Leaks in Maps in Go and How to Avoid Them

gRPC/Protocol buffers : A conversation between two services

Mastering Dynamic Memory Allocation in C/C++: A Comprehensive Guide

Eye opening exp with indexDB

notebook:

John Beresniewicz的更多文章

Estimating OLTP Execution Latencies Using ASH

ASHviz: Dark matter 2

ASHviz: Fiddling with violins

ASHviz: Densities and dark matter

ASHviz: Can you box that, please?

ASHviz: Accidentally good

ASHviz: Visualizing ASH dumps with Jupyter Notebooks

Visualizing Performance Benchmarks (4) - Validate, analyze, conclude

Simple SQL Injection Vulnerability Testing

Visualizing Performance Benchmarks (3) - Start Small and Predict

社区洞察

其他会员也浏览了

Race condition

The Tale of Sir Spring Boot and the Quest for the Large Data Dragon

Bring the Data Up Front

Cache hit ratios at the edge: a performance study

what is the Node.js event loop

Tackling Two Major Challenges in Computer Science: Caching and Naming Conventions

?? Memory Leaks in Maps in Go and How to Avoid Them

gRPC/Protocol buffers : A conversation between two services

Mastering Dynamic Memory Allocation in C/C++: A Comprehensive Guide

Eye opening exp with indexDB