登录查看更多内容

Types of Performance Data

Stephen Townshend

Tech Manager and Reliability, Observability and Resilience Engineer and Advocate

发布日期: 2018年7月2日

There's more to performance than response time.

In this blog we are going to look at the kinds of data we can look at to understand system performance, and where we can find it.

What performance data is there?

There's lots of different measurements that can tell us about software performance. The below diagram summarises some of the key ones:

Here's a brief explanation of each:

Response time is any measurement regarding how long something is taking. We often associate response time with user (or customer) experience. There's different scales of response time to consider:

User response time, for example, submitting a search on a website.
Component response time, for example, how long a particular API, line of code, or database query is taking.
Processing time for longer events such as batch jobs.

Throughput is the volume of load our system is under within a certain time-frame. Examples of throughput include business transactions per hour, pages per hour, API requests per second, or Mbps of data transferred over a network.

Workload is more than just throughput, concurrency is not the volume of load but how it is applied. There's different kinds of concurrency to consider:

Concurrent user sessions on a system and the memory footprint of each.
The rate of arrival and whether requests are coming in concurrently or not.

The third aspect of workload which I haven't included in the diagram is the nature of the load. What are the specific business activities, API's, or transactions that our users (or consumers) are completing and what is the proportion of each?

Errors tell is about the stability of our system. The error messages themselves are often very useful. If you have an error in a log stating the application has run out of heap space, that's useful information.

There's also the rate of errors and when they occur. Is the error rate proportional to the load being applied? Does it change at different times of the day? Do certain errors occur at certain times more than others?

Server resources are the tip of the iceberg. There are countless resources we can monitor at an operating system and application level, but I suggest start with the basics. The four key hardware resources are processor, memory, network, and disk.

Lastly, I have mentioned queue length. Think of a software system as a giant sausage machine made up of lots of little sausage machines. Each of these is capable of producing sausages at a certain rate - maybe one sausage per second. If we try and shove meat into the machine faster than it can process, we'll get a backlog of meat hanging out the back - a queue.

There are intentional queues such as ActiveMQ or the Azure Service Bus, but there are also unintentional queues. For example, say our application is trying to process thousands of transactions but the CPU is 100% saturated. These transactions get queued up waiting for the CPU to be free. If we can monitor either the intentional or unintentional queues it can help us understand the bottlenecks and behaviours of our system.

Where can we find this data?

Performance data can be found in a lot of different places. Here are some of the most common:

Load testing tools are the most obvious place to look for performance data. Here we get response time and system behaviour metrics, some of these tools also capture server or application resources. But what if we want to drill down deeper? Or what if we want to look at a production system without running a test?

Server logs are a great source of performance data which I spoke about at length in my Neotys PAC talk. You'll either need a log analytics tool or you'll need to get your hands dirty and write some code to parse these logs in order to make sense of them.

Server resource monitoring tools are how we capture those basic hardware resources I mentioned earlier (processor, memory, disk, and network). There's hundreds of options, depending on your platform.

I often find a lot of value in querying the application database of the system under test. In most cases I can at least find workload information to help build a more accurate model, but often applications log performance metrics directly to a database - including timings.

If you have access to an application performance monitoring (APM) tool you'll probably be able to get very fine grained and detailed performance metrics about your system. These tools (generally speaking) have agents which listen in to every line of code or database query run in production or during a performance test.

In Summary

I've deliberately kept this blog simple. In my next blog I'll be talking about some of the considerations we need to make when interpreting this kind of performance data. It's one thing to have the data, quite another to understand how the system is actually behaving.

Vineet Kumar

6 年

Very informative article .

Jagadeesh V.

6 年

Thanks?

Matthew Crabbe

6 年

A very clear and informative article. Thanks.

2 次回应

查看更多评论

要查看或添加评论，请登录

Stephen Townshend的更多文章

Monitoring your Mac with Prometheus

2022年12月20日

Monitoring your Mac with Prometheus

A few weeks ago I was exploring SquaredUp Cloud which is an dashboarding and visibility platform that lets you connect…

6 条评论
Running your first Kubernetes workload in AWS with EKS

2022年12月14日

Running your first Kubernetes workload in AWS with EKS

I have been using Kubernetes for about a year and a half, but through all of that time I've only ever deployed…
Containerising a Node.js app

2022年11月17日

Containerising a Node.js app

As a Developer Advocate, I need to keep my technical skills up to date and to practice what I preach. One way I'm doing…
A Year as an SRE

2022年11月10日

A Year as an SRE

A bit over a year ago I transitioned from performance engineering into the world of Site Reliability Engineering (SRE).…

7 条评论
The HTTP Protocol (explained)

2022年8月30日

The HTTP Protocol (explained)

What's this all about? A few years ago, I started writing a book about performance engineering. I only finished a rough…

6 条评论
Running Grafana & Prometheus on Docker

2022年8月2日

Running Grafana & Prometheus on Docker

We're in the process of standing up a monitoring platform on Kubernetes. Before we started this process I had very…

11 条评论
Is cloud computing killing performance testing?

2022年3月8日

Is cloud computing killing performance testing?

I 've received a few messages recently from individuals concerned that performance testing is "on the decline". The…

17 条评论
Wrapping up 13 years of performance engineering

2022年2月7日

Wrapping up 13 years of performance engineering

Thirteen years ago, I fired off my CV to a few dozen organisations looking for my first job in IT. Months later, after…

9 条评论
Performance Engineer to SRE?

2021年11月8日

Performance Engineer to SRE?

Two months ago I transitioned from a performance engineer to a site reliability engineer (SRE). It's been terrifying at…

21 条评论
Before you automate your performance testing…

2020年8月31日

Before you automate your performance testing…

This year I’ve been working in a large program of work. My role is to oversee the performance testing and engineering…

14 条评论

See all articles

Types of Performance Data

Stephen Townshend

Tech Manager and Reliability, Observability and Resilience Engineer and Advocate

What performance data is there?

Where can we find this data?

In Summary

Stephen Townshend的更多文章

社区洞察

其他会员也浏览了

You ask. I answer.

`find_each` vs `in_batches`: Which One Should You Use?

Always secure your OPTIONS

How JAX-RS Handles Validation Failure

Multi-Segment Analysis in LiveWire

The Power of Span<T>: Managing Data in Memory Efficiently with C#

Service Level - I (indicators), O (objectives), and A (agreements)

Diagnose a System Slowdown in Two Minutes

Optimize API's performance

What performance data is there?

Where can we find this data?

In Summary

Stephen Townshend的更多文章

Monitoring your Mac with Prometheus

Running your first Kubernetes workload in AWS with EKS

Containerising a Node.js app

A Year as an SRE

The HTTP Protocol (explained)

Running Grafana & Prometheus on Docker

Is cloud computing killing performance testing?

Wrapping up 13 years of performance engineering

Performance Engineer to SRE?

Before you automate your performance testing…

社区洞察

其他会员也浏览了

You ask. I answer.

`find_each` vs `in_batches`: Which One Should You Use?

Always secure your OPTIONS

How JAX-RS Handles Validation Failure

Multi-Segment Analysis in LiveWire

The Power of Span<T>: Managing Data in Memory Efficiently with C#

Service Level - I (indicators), O (objectives), and A (agreements)

Diagnose a System Slowdown in Two Minutes

Optimize API's performance