Linux Performance issues troubleshooting
Image credit RedHat website

Linux Performance issues troubleshooting


You are having performance issues on your Linux Server ( Bare metal or EC2). You login and start checking the underlying cause. What commands should be on your checklist to do it fast and efficiently?

Below is my go-to list.

uptime: Check the last 3 numbers, which show the 1, 5, and 10 mins exponentially damped load averages.

18:44:08 up 14:52, 1 user, load average: 32.22, 26.17, 21.20

If the 1 min number is much higher than the 10, 15 mins load averages, then the load is still increasing. If otherwise, then the issue might have already subsided, and you have likely missed the bus.

Drill down further into loads on your CPU with the below.

mpstat -P ALL 5: Shows CPU time per CPU. Check for any of them running at 100% consistently. If so, it's a single-threaded process that is using up that 1 core and causing performance issues for itself. A multi-threaded redesign of that process might be in Order.

dmesg: It will show all system messages and is an excellent place to check if there were any system errors. e.g., oom killers, packet drops, etc. You can then take action accordingly.

Check memory consumptions using free and vmstat.

vmstat 5: virtual memory stats ( the 5 means refresh after every 5 seconds interval ). The 1st line has stats since boot, and the rest of the lines are at 5 seconds refreshes.

free -m: Shows free memory, especially note if the buffers are cached are good ( ie decent non zero numbers ), else those can lead to iowaits.

total used free shared buffers cached

Mem: 285999 24546 261453 80 62 541

-/+ buffers/cache: 23945 262053

Swap:

If you suspect i/o bottlenecks, then check those with iostat, dd and iotop:

iostat -xd -k 2 5 or iostat -p: It shows i/o performance of devices or nfs mounts and write response times.

Use man or help to find out the various other options

iotop -aoP: Will show a list of processes using up most disk i/o along with other neat stats like %tage of the disk i/o each is consuming.

dd:

dd if=/dev/zero of=mytest_write.txt bs=64k count=16k conv=fdatasync --> Will create a file of 0s, called mytest.txt and write to it. This will show stats for write speed.

dd if=mytest_read.txt of=/dev/null bs=64k count=16k --> Have a massive file ( mytest_read.txt) available to read, and the above command can help check for reading speeds.

lsof: list of open files. Can be useful to check files when a disk is not getting unmounted. Can be useful to check open files on a given port (use flag -TCP:port number) or all network connections ( use -i flag)

Check process level resource consumption using the below.

pidstat 5: Will show a rolling summary of resource consumption by each PID and will keep on refreshing after n ( 5 in the above example ) seconds. Very useful to find which processes are consuming the most resources.

htop ( or top ): Shows cmd level consumption and can be sorted on various columns. Check more details on https://www.maketecheasier.com/power-user-guide-htop/

ps: ps has a large number of options. Combine them to get useful data. E.g. the below 2.

ps -aeFHI --sort --cpu%,%mem

ps -eah --format uid,pid,tty,%cpu,rss,cmd --sort %cpu,-rss

Check for network bottlenecks.

sar:

sar -n TCP,ETCP1

sar -n DEV 5

netstat -a |more : Network statistics of interface, incoming and outgoing packets.

iftop: Similar to what top does, but for network usage stats.

tcpdump: It will need you to be root/sudo, so not sure it will work for most in Enterprise levels. But if you have access ( I don't ), then along with Wireshark, it's a very powerful tool.

There are a large number of other cmds that can be useful. There are also a very large number of OSS tools, which I haven't mentioned here as most of those might not be installed on your Enterprise Linux Hosts.

Hope the above helps you to find out performance issues on *Nix systems faster in your daily work life.











要查看或添加评论,请登录

Kaushik Banerjee ( He/Him/His )的更多文章

社区洞察

其他会员也浏览了