登录查看更多内容

[Performance] : What does CPU% usage tell us ?

Akshay Deshpande

Architect, Engineering

发布日期: 2019年3月4日

[Edit] -- All my latest articles are not published at - https://performanceengineeringin.wordpress.com/

When you come across a system which is misbehaving, majority of the times the first metrics that we look at is CPU usage. But do we really understand what CPU usage of a system tells us ? In this article let us try and understand what X % usage of a system really means.

One of the easy ways to check on CPU is "top" command.

The "%Cpu(s)" metrics seen above is a combination of different components.

us - Time spent in user space
sy - Time spent in kernel space
ni - Time spent running niced user processes (User defined priority)
id - Time spent in idle operations
wa - Time spent on waiting on IO peripherals (eg. disk)
hi - Time spent handling hardware interrupt routines. (Whenever a peripheral unit want attention form the CPU, it literally pulls a line, to signal the CPU to service it)
si - Time spent handling software interrupt routines. (a piece of code, calls an interrupt routine...)
st - Time spent on involuntary waits by virtual cpu while hypervisor is servicing another processor (stolen from a virtual machine)

Out of all the breakdowns above, we usually concentrate mainly on User Time (us) , System time(sy) and IO wait time (wa). User time is the percentage of time the CPU is executing the application code and System time is the percentage of time the CPU is executing the kernel code. It is important to note that System time is related to application time; if application performs IO for example, the kernel will execute the code to read file from disk. Also, any wait seen in IO will reflect in IO wait time. So us%, sy% and wa % are related.

Now let's see if we understand this correctly on a whole.

My goal as a Performance Engineer would be to drive the CPU usage as high as possible for as short a time as possible. Does that sound far away from the "best-practice" ? Wait, hold your thought there.

The first thing to know is, the CPU usage reported by any command is always an average over an interval of time. If the CPU consumed by an application is 30% for 10minutes, the code can be tuned to make it consume 60% for 5minutes. Do you see what I mean by "driving the CPU as high as possible for as short time as possible"? This is doubling the performance. Did the CPU usage increase ? Sure, Yes. But is that a bad thing ? No. CPU is sitting there waiting to be used. Use it, improve the performance. High CPU usage is not a bad thing all the time. It may just mean that your system is used at its full potential. A good ROI. However, if you have your run-queue length increasing, where requests are waiting for cpu, then it definitely needs your attention.

In linux systems, the number of threads that are able to run (i.e, not blocked on IO or sleeping etc) are referred to as run-queue. You can check this by running "vmstat 1" command. The first number in each line refers to run-queue.

If the count of the threads in the above output is more than the available CPU's (count in hyper-threading if enabled), that means the threads are waiting for CPU and the performance will be less the optimal. Although a higher number is ok for a brief amount of time, but if the run-queue length is high for a significant amount of time, it is an indication that system is overloaded.

Conclusion :

High CPU usage of a system is not a bad sign all the time. CPU is available to be used. Use it and improve the performance of the running application.
If run-queue length is high for a significant amount of time, that mean the system is overloaded, and needs optimizations.

Sudhakar Reddy Nalabolu

Perfromance Lead @ Capgemini | Performance Engineering

5 年

hi , my name is sudhakar .I worked in actiance only for two months before you joined.due to personal issues I resigned. I am passionate to learn performance engineering(analysis). can you help on this.

Sudhakar Reddy Nalabolu

Perfromance Lead @ Capgemini | Performance Engineering

5 年

good one

Abhijeet Deshpande

QA Lead at Meta | Ex-Amazonian

6 年

Nicely articulated, Very useful.! Thanks Akshay, keep them coming ??????

1 次回应

Gaurav Srivastava

6 年

Good one..????

1 次回应

查看更多评论

要查看或添加评论，请登录

Akshay Deshpande的更多文章

[Performance] : Understanding CPU Time

2018年12月22日

[Performance] : Understanding CPU Time

As a Performance Engineer, time and again you will come across a situation where you want to profile CPU of a system…

4 条评论
Performance Engineering Checklists

2018年7月27日

Performance Engineering Checklists

Checklists; something that I truly believe in having for everything. Right from list of things to check before leaving…

5 条评论
Random module in Python

2018年5月16日

Random module in Python

Time and again if you are building a small game or if you are trying to pick something randomly in Python, you will…
[Linux] Understanding Load Average

2018年3月20日

[Linux] Understanding Load Average

I am writing this quick article to help understand the importance of Load Average in Linux for a Performance Engineer…

4 条评论
Understanding Physical and Logical CPUs

2018年3月4日

Understanding Physical and Logical CPUs

Here is a quick write up on what actually Physical and Logical CPU mean, and how are they different. To begin with, the…

3 条评论
Counter in Python

2018年2月26日

Counter in Python

Counters is the best thing that has happened to Python when you think as a Data analyst. Counters can be used as an…

2 条评论
Jprofiler - CPU profiling

2018年2月14日

Jprofiler - CPU profiling

Jprofiler can serve many purposes, and CPU profiling and analysis is one of them. This article is more about how to…

1 条评论
Learning how to learn [Python]

2018年1月21日

Learning how to learn [Python]

As Brain Herbert has rightly said: It is very much important to learn new things (anything) as learning never ends…

See all articles

[Performance] : What does CPU% usage tell us ?

Akshay Deshpande

Architect, Engineering

Akshay Deshpande的更多文章

社区洞察

其他会员也浏览了

PCIe Enumeration

Navigating the CPU: Understanding Execution Times, Challenges, Efficiency, Troubleshooting, and Task Distinctions part II

DDR5 Memory: Coming Soon To A Server Near You

Performance, Scalability and Availability checklist which can be used to check if costly CPU cycles are the reason for the impact.

When Memory Runs Dry: Understanding the OOM Killer’s Decision Process

RAN Functional Splits: Whose CPU Capacity is it Anyway?

OS Fundamentals: Part 1, Understanding the hardware, and its abstractions

CPU works. Oh really? But how?

ARM Interrupt Controllers - The Gateway to CPU's attention

Identifying and Resolving CPU Bottlenecks Due to Hyper-Threading

Akshay Deshpande的更多文章

[Performance] : Understanding CPU Time

Performance Engineering Checklists

Random module in Python

[Linux] Understanding Load Average

Understanding Physical and Logical CPUs

Counter in Python

Jprofiler - CPU profiling

Learning how to learn [Python]

社区洞察

其他会员也浏览了

PCIe Enumeration

Navigating the CPU: Understanding Execution Times, Challenges, Efficiency, Troubleshooting, and Task Distinctions part II

DDR5 Memory: Coming Soon To A Server Near You

Performance, Scalability and Availability checklist which can be used to check if costly CPU cycles are the reason for the impact.

When Memory Runs Dry: Understanding the OOM Killer’s Decision Process

RAN Functional Splits: Whose CPU Capacity is it Anyway?

OS Fundamentals: Part 1, Understanding the hardware, and its abstractions

CPU works. Oh really? But how?

ARM Interrupt Controllers - The Gateway to CPU's attention

Identifying and Resolving CPU Bottlenecks Due to Hyper-Threading