Linux Performance Tuning
Reza Bojnordi
Site Reliability Engineer & System Engineer @ BCW Group | Solutions Architect & Cloud Operations
1.1 Linux process management
A process is an instance of execution that runs on a processor.
task_struct?->?process descriptor
Life cycle of processes
parent process?->?fork()?->?child process?->?exec()?->?child process?->?exit()?->?zombie process?->?parent process
Copy On Write
Kernel only assgins the new physical page to the child processes when the child process call?exec()?which copies the new program to the address space of the child process.
The child process will not be completely removed unitl the parent process knows of the termination of its child process by the?wait()?system call.
Thread
?thread is an execution unit generated in a single process. It runs parallel with other threads in the same process.
Thread creation is less expensive than process creation because a thread does not need to copy resources on creation.
Process priority and nice level
Process priority is a number that determines the order in which the process is handled by the CPU and is determined by dynamic priority and static priority.
Linux supports?nice?levels from 19(lowest priority) to -20(highest priority).
Context switching
During process execution, information on the running process is stored in registers on the processor and its cache. The set of data that is loaded to the register for the executing process is called the context.
Interrupt handling
The interrupt handler notifies the Linux Kernel of an event. It tells the kernel to interrup process execution and perform interrup handling as quickly as possible because some device requires quick responsiveness.
Interrupts cause?context switching
In a multi-processor environment, interrupts are handled by each processor. Binding interrupts to a single physical processor could improve system performance.
Process state
Every process has its own state that shows what is currently happening in the process.
Zombie processes
It is not possible to kill a zombie process with the kill command, because it is already considered dead. If you cannot get rid of a zombie, you can kill the parent process and then the zombie disappears as well.
Process memory segments
Linux CPU scheduler
O(1)?https://en.wikipedia.org/wiki/O(1)_scheduler?https://www.ibm.com/developerworks/library/l-completely-fair-scheduler/
two process priority arrays
As processes are allocated a timeslice by the scheduler, based on their priority and prior blocking rate, they are placed in a list of processes for their priority in the active array. When they expire their timeslice, they are allocated a new timeslice and placed on the expired array.
When all processes in the active array have expired their timeslice, the two arrays are switched, restarting the algorithm
1.2 Linux memory architecture
32-bit architectures -- 4 GB address space (3 GB usesr space and 1 GB kernel space) 64-bit architectures -- 512 GB or more for both user/kernel space.
Virtual memory manager
Applications do not allocate physical memory but request a memory map of a certain size at the Linux kernel and in exchange receive a map in virtual memory.
VM does not necessarily have to be mapped into physical memory. If your app allocates a large amount of memory, some of it might be mmapped to the swap file on the disk subsystem.
Applications usually do not write directly to the disk subsystem, but into cache or buffers.
Page frame allocation
A page is a group of contiguous linear addresses in physical memory (page frame) or virtual memory.
A page is usually 4K bytes in size.
Buddy system
The Linux kernel maintains its free pages by using a mechanism called a?buddy system.
The buddy system maintains free pages and tries to allocate pages for page allocation requests. It tries to keep the memory area contiguous.
When the attempt of page allocation fails, the page reclaiming is activated.
Page frame reclaiming
kswapd?kernel thread and?try_to_free_page()?kernel function are responsible for page reclaiming.
kswapd?tries to find the candidate pages to be taken out of active pages based on?LRU?principle.
The pages are used mainly for two purposes:?page cache?and?process address space?The page cache is pages mapped to a file on disk. The pages that belong to a process address space are used for heap and stack.
swap
If the virtual memory manager in Linux realizes that a memory page has been allocated but not used for a significant amount of time, it moves this memory page to swap space.
The fact that swap space is being used does not indicate a memory bottleneck; instead, it proves how efficiently Linux handles system resources.
1.3 Linux file systems
Virtual file system
VFS is an abstraction interface layer that resides between the user process and various types of Linux file system implementations.
Journaling
non-journaling file system?fsck?checks all the metadata and recover the consistency at the time of next reboot. But when the system has a large volume, it takes a lot of time to be completed.?The system is not operational during this process
journaling file system?Writing data to be changed to the area called the journal area before writing the data to the actual file system. The journal area can be placed both in the file system or out of the file system. The data written to the journal area is called the journal log. It includes the changes to file system metadata and the actual file data it supported.
Ext2
The extended 2 file system is the predeceessor of the extended 3 file system.
领英推荐
Ext3
Mode of journaling
1.4 Disk I/O subsystem
Before a processor can decode and execute instructions, data should be retrieved all the way from sectors on a disk platter to the processor and its registers. The results of the executions can be written back to the disk.
I/O subsystem architecture
Cache
Memory hierarchy
L1 cache, L2 cache, L3 cache, RAM and some other caches between the CPU and disk.
The higher the cache hit rate on faster memory is, the faster the access to the data.
Locality of reference
Flushing a dirty buffer
When a process changes data, it changes the memory first, so at the this time the data in memory and in disk is not identical and the data in memory is refered to as a?dirty buffer.
The dirty buffer should be synchronized to the data on the disk as soon as possible, or the data in memory could be lost if a suddden crash occurs.
The synchronization process for a dirty buffer is called?flush.
kupdate?-- occurs on a regular basis.
/proc/sys/vm/dirty_background_ratio?-- the propotion of dirty buffers in mem
Block layer
The block layer handles all the activity related to block device operation.
The?bio?structure is an interface between the file system layer and the block layer.
Block sizes
The smallest amount of data that can be read or written to a drive, can have a direct impact on a server's performance.
I/O elevator
I/O device driver
1.4 RAID and storage system
1.5 Network subsystem
Networking implementation
The socket provides an interface for user applications.
Socket buffer
/proc/sys/net/core/rmem_max
/proc/sys/net/core/rmem_default
/proc/sys/net/core/wmem_max
/proc/sys/net/core/wmem_default
/proc/sys/net/ipv4/tcp_mem
/proc/sys/net/ipv4/tcp_rmem
/proc/sys/net/ipv4/tcp_wmem
Network API(NAPI)
The standard implementation of the network stack in Linux focuses more on reliability and low latency than on low overhead and high throughput.
Gigabit Ethernet and modern applications can create thousands of packets per second, causing a large number of interruts and context switches to occur.
For the first packet, NAPI works just like traditional implementation as it issues an interrupt for the first packet. But after the first packet, the interface goes into a polling mode. As long as there are packets in the DMA ring buffer of the network interface, no new interrupts will be caused, effectively reducing context switching and the associated overhead. Should the last packet be processed and the ring buffer be emptied, then the interface card will again fall back into the interrupt mode. NAPI also has the advantage of improved multiprocessor scalability by creating soft interrupts that can be handled by multiple processors.
Netfilter
You can manipulate and configure Netfilter using the iptables utility.
Netfilter Connection tracking
TCP/IP
Traffic control
Offload
If the neetwork adapter on your system supports hardware offload functionality, the kernel can offload part of its task to the adapter and it can reduce CPU utilization.
Bonding module
1.6 Understanding Linux performance metrics
Processor metrics
Memory metrics
Network interface metrics
Block device metrics