Types of Performance Data

Types of Performance Data

There's more to performance than response time.

In this blog we are going to look at the kinds of data we can look at to understand system performance, and where we can find it.

What performance data is there?

There's lots of different measurements that can tell us about software performance. The below diagram summarises some of the key ones:

Here's a brief explanation of each:

Response time is any measurement regarding how long something is taking. We often associate response time with user (or customer) experience. There's different scales of response time to consider:

  • User response time, for example, submitting a search on a website.
  • Component response time, for example, how long a particular API, line of code, or database query is taking.
  • Processing time for longer events such as batch jobs.

Throughput is the volume of load our system is under within a certain time-frame. Examples of throughput include business transactions per hour, pages per hour, API requests per second, or Mbps of data transferred over a network.

Workload is more than just throughput, concurrency is not the volume of load but how it is applied. There's different kinds of concurrency to consider:

  • Concurrent user sessions on a system and the memory footprint of each.
  • The rate of arrival and whether requests are coming in concurrently or not.

The third aspect of workload which I haven't included in the diagram is the nature of the load. What are the specific business activities, API's, or transactions that our users (or consumers) are completing and what is the proportion of each?

Errors tell is about the stability of our system. The error messages themselves are often very useful. If you have an error in a log stating the application has run out of heap space, that's useful information.

There's also the rate of errors and when they occur. Is the error rate proportional to the load being applied? Does it change at different times of the day? Do certain errors occur at certain times more than others?

Server resources are the tip of the iceberg. There are countless resources we can monitor at an operating system and application level, but I suggest start with the basics. The four key hardware resources are processormemorynetwork, and disk.

Lastly, I have mentioned queue length. Think of a software system as a giant sausage machine made up of lots of little sausage machines. Each of these is capable of producing sausages at a certain rate - maybe one sausage per second. If we try and shove meat into the machine faster than it can process, we'll get a backlog of meat hanging out the back - a queue.

There are intentional queues such as ActiveMQ or the Azure Service Bus, but there are also unintentional queues. For example, say our application is trying to process thousands of transactions but the CPU is 100% saturated. These transactions get queued up waiting for the CPU to be free. If we can monitor either the intentional or unintentional queues it can help us understand the bottlenecks and behaviours of our system.

Where can we find this data?

Performance data can be found in a lot of different places. Here are some of the most common:

Load testing tools are the most obvious place to look for performance data. Here we get response time and system behaviour metrics, some of these tools also capture server or application resources. But what if we want to drill down deeper? Or what if we want to look at a production system without running a test?

Server logs are a great source of performance data which I spoke about at length in my Neotys PAC talk. You'll either need a log analytics tool or you'll need to get your hands dirty and write some code to parse these logs in order to make sense of them.

Server resource monitoring tools are how we capture those basic hardware resources I mentioned earlier (processor, memory, disk, and network). There's hundreds of options, depending on your platform.

I often find a lot of value in querying the application database of the system under test. In most cases I can at least find workload information to help build a more accurate model, but often applications log performance metrics directly to a database - including timings.

If you have access to an application performance monitoring (APM) tool you'll probably be able to get very fine grained and detailed performance metrics about your system. These tools (generally speaking) have agents which listen in to every line of code or database query run in production or during a performance test.

In Summary

I've deliberately kept this blog simple. In my next blog I'll be talking about some of the considerations we need to make when interpreting this kind of performance data. It's one thing to have the data, quite another to understand how the system is actually behaving.

Very informative article .

回复
Jagadeesh V.

Site Reliability Engineer Lead | DevOps Lead | Azure DevOps | AWS | Kubernetes | Terraform | CI/CD Automation | Cloud & Performance Engineering | Dubai Job Seeker

6 年

Thanks?

回复

A very clear and informative article. Thanks.

要查看或添加评论,请登录

Stephen Townshend的更多文章

  • Monitoring your Mac with Prometheus

    Monitoring your Mac with Prometheus

    A few weeks ago I was exploring SquaredUp Cloud which is an dashboarding and visibility platform that lets you connect…

    6 条评论
  • Running your first Kubernetes workload in AWS with EKS

    Running your first Kubernetes workload in AWS with EKS

    I have been using Kubernetes for about a year and a half, but through all of that time I've only ever deployed…

  • Containerising a Node.js app

    Containerising a Node.js app

    As a Developer Advocate, I need to keep my technical skills up to date and to practice what I preach. One way I'm doing…

  • A Year as an SRE

    A Year as an SRE

    A bit over a year ago I transitioned from performance engineering into the world of Site Reliability Engineering (SRE).…

    7 条评论
  • The HTTP Protocol (explained)

    The HTTP Protocol (explained)

    What's this all about? A few years ago, I started writing a book about performance engineering. I only finished a rough…

    6 条评论
  • Running Grafana & Prometheus on Docker

    Running Grafana & Prometheus on Docker

    We're in the process of standing up a monitoring platform on Kubernetes. Before we started this process I had very…

    11 条评论
  • Is cloud computing killing performance testing?

    Is cloud computing killing performance testing?

    I 've received a few messages recently from individuals concerned that performance testing is "on the decline". The…

    17 条评论
  • Wrapping up 13 years of performance engineering

    Wrapping up 13 years of performance engineering

    Thirteen years ago, I fired off my CV to a few dozen organisations looking for my first job in IT. Months later, after…

    9 条评论
  • Performance Engineer to SRE?

    Performance Engineer to SRE?

    Two months ago I transitioned from a performance engineer to a site reliability engineer (SRE). It's been terrifying at…

    21 条评论
  • Before you automate your performance testing…

    Before you automate your performance testing…

    This year I’ve been working in a large program of work. My role is to oversee the performance testing and engineering…

    14 条评论

社区洞察

其他会员也浏览了