登录查看更多内容

Server Performance Tuning: Interrupt & Process affinity (The CPU love affair)

Nishant Kumar

Principal Engineer at Kayzen

发布日期: 2017年10月7日

This article is going to be bit technical and requires basic knowledge of Computer System Architecture. However, I will try to explain everything in simple language to make it easy for newbies as well.

The purpose of this article is to explain how CPU affinity works in Linux machine and how it is handled in multi-processor system. This will help you to optimise CPU load according to your specific requirement and improve performance.

I will start by explaining terminologies used and then how can we use it to achieve the desired result.

What is Interrupt?

An interrupt is a signal to the processor emitted by hardware or software indicating an event that needs immediate attention. In simple word, Whenever a hardware such as a disk controller or ethernet card needs attention from CPU it sends a signal knows as Interrupt. The interrupt tells the CPU that something has happened and that the CPU should drop what it's doing to handle the event.

What is Interrupt request (IRQ)?

In order to prevent multiple devices from sending the same interrupts, the IRQ system was established where each device in a computer system is assigned its own special IRQ so that its interrupts are unique. For example keyboard, USB, network card etc each have their own IRQ.

What is IRQ Affinity?

IRQ affinity is an ability of Linux to assign (affinity) certain IRQs to specific processors (or groups of processors) i.e. particular IRQ will be handled only by a specific set of processors. It allows you to restrict or repartition the workload that your server must do so that it can more efficiently do its job.

What is IRQ balancer?

irqbalance is a command line tool that distributes hardware interrupts across processors to improve system performance. It comes with the latest kernel in most of the Linux variant.

What is Processor affinity or CPU pinning?

Similar to IRQ affinity, it enables the binding and unbinding of a process or a thread to a CPU or a range of CPUs, so that the process or thread will execute only on the designated CPU or CPUs rather than any CPU. This can be viewed as a modification of the native central queue scheduling algorithm in a symmetric multiprocessing (SMP) operating system

What is taskset?

taskset is linux utility program used to set or retrieve the CPU affinity of a running process given its PID or to launch a new COMMAND with a given CPU affinity. numactl utility can also be used in NUMA machines.

What to do with all these information ??

Now you are familiar with all the characters of this story, let's get started.

Servers (Linux machines) are designed/configured in a generic way to handle all kind of load/work in a balanced manner i.e. they are not designed to handle specific kind of work and hence might not be fully trained to do your specific work. For example, you might have an application that does a lot of I/O operation (disk read/write) and by default, only a few CPUs will be assigned for I/O work and all other CPUs will be idle. So you are not fully utilizing all the computation power/resource available in your system.

Let's figure out what IRQ a device is using. This information is available in the /proc/interrupts file. Here's a sample :

 [root@nishant /proc]# cat /proc/interrupts 
            CPU0       CPU1       CPU2       CPU3       
   0:    4865302    5084964    4917705    5017077    IO-APIC-edge  timer
   1:        132        108        159        113    IO-APIC-edge  keyboard
   2:          0          0          0          0          XT-PIC  cascade
   8:          0          1          0          0    IO-APIC-edge  rtc
  10:          0          0          0          0   IO-APIC-level  usb-ohci
  14:          0          0          1          1    IO-APIC-edge  ide0
  24:      87298      86066      86012      86626   IO-APIC-level  aic7xxx
  31:      93707     106211     107988      93329   IO-APIC-level  eth0
 NMI:          0          0          0          0 
 LOC:   19883500   19883555   19883441   19883424

There is an excellent article on SMP IRQ Affinity that explains it in much greater detail and how you can play with it. Please go to this article. I am saving myself explaining the same thing again here :)

https://cs.uwaterloo.ca/~brecht/servers/apic/SMP-affinity.txt

Now you know how to allocate a certain set of CPU to certain IRQ based on your requirement. In above example, you can reserve more CPU to I/O operations and that will help to boost application performance. Another good example of use of IRQ affinity is assigning each NIC (network queue) to single CPU on a multi-NIC system to reduce network latency.

Similarly, process affinity can be used to achieve maximum possible performance on a given system.

https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_MRG/1.3/html/Realtime_Reference_Guide/chap-Realtime_Reference_Guide-Affinity.html

Let's take an example of 24 Core machine where we are doing network intensive work. We can allocate first 16 core to network and remaining 8 core to Server application. You can run this command to view current CPU affinity of your process

// PID is process ID of your application
$ taskset -p PID
pid PID's current affinity mask: ffffff

ffffff means, this process can use any of the 24 core (each f is hex representation of binary 1111 i.e. 4 core).

Now let's allocate last 8 core to this PID.

$ taskset -p ff0000 PID
pid PID's new affinity mask: ff0000

Let's bind first 16 core to network queues. Assuming Network IRQ are from 99.

echo 1 > /proc/irq/99/smp_affinity
echo 2 > /proc/irq/100/smp_affinity
echo 4 > /proc/irq/101/smp_affinity
echo 8 > /proc/irq/102/smp_affinity
echo 10 > /proc/irq/103/smp_affinity
echo 20 > /proc/irq/104/smp_affinity
echo 40 > /proc/irq/105/smp_affinity
echo 80 > /proc/irq/106/smp_affinity
...
...

In this example 1,2,3,4 signify 1st, 2nd, 3rd and 4th CPU. Similarly 10,20,30,40 is 5th,6th,7th and 8th CPU.

You can find out IRQ of network queue using above cat /proc/interrupt command or you can use -

$ for i in {0..3}; do echo eth$i; grep eth$i-TxRx /proc/interrupts | awk '{printf "  %s\n", $1}'; done

This will print all network IRQ. In Centos 7 or some other OS eth are renamed to int or similar names. Please refer to your respective OS documentation.

I hope that this article will help you to understand CPU affinity and to obtain greater performance from your system resources. Feel free to comment :)

Vishal Rahikar

Software Validation Engineer @ Intel Corporation | Networking, Virtualization | Python Automation

3 年

very good article, covers all aspects which i was looking for ! But i do have one query. I worked on real time smartNIC project where we are doing this affinity settings. My question is how you are deciding core numbers (you mentioned 1, 2, 4, 8, 10 ,20, 40, 80 // same we also received in specification document) . why not to use 1,2,3,4,5,6,7,8,9,10,11,12,13,14 and so on ... ?

Min A.

Engineer (Platform Team)

5 年

Great IRQ smp_affinity article.? Just on the VM guest the grep would be ens$i-rxtx otherwise it wouldn't return anything. Good work though.

1 次回应

Jean Carlos

Analista de infraestrutura e Redes na ClickTurbo Internet

6 年

Very interesting your article, very good you contrib with a title where many people is lost.

1 次回应

Chandan Prakash

Walmart | Ex-Qubole | NIT Bhopal

6 年

Nice article Nishant Kumar, keep it on!!

1 次回应

查看更多评论

要查看或添加评论，请登录

查看全部

Server Performance Tuning: Interrupt & Process affinity (The CPU love affair)

Nishant Kumar

Principal Engineer at Kayzen

What to do with all these information ??

更多精彩文章

社区洞察

其他会员也浏览了

Virtualisation and Processor Capability

Windows performance toolkit for performance analysis:

Of Dials and Switches -- Part II: More about Tunables

Is ChromeOS the most advanced OS on the market today?

How do Virtual Machines and Containers operate?

What is the best hardware for TimesTen Scaleout?

ASIC's vs CPU Routing

Facebook rolls out Bryce Canyon, its next-gen storage platform

High Performance SDN/NFV with DPDK

Running VM in container, and then container in VM ...

What to do with all these information ??

Large Scale Low Latency System Design: A Journey Towards & Beyond Million QPS

2020年6月4日

Protocol Buffers with Golang

2017年5月29日

Docker: The beginners guide

2017年3月25日

Elastic Search: Simplifying schema for scale

2016年8月24日

Aerospike NoSQL: The Good and The Ugly

2016年8月23日

社区洞察

其他会员也浏览了

Virtualisation and Processor Capability

Windows performance toolkit for performance analysis:

Of Dials and Switches -- Part II: More about Tunables

Is ChromeOS the most advanced OS on the market today?

How do Virtual Machines and Containers operate?

What is the best hardware for TimesTen Scaleout?

ASIC's vs CPU Routing

Facebook rolls out Bryce Canyon, its next-gen storage platform

High Performance SDN/NFV with DPDK

Running VM in container, and then container in VM ...