Nagios Plugin: Monitor thread count of your service
If programs start too many threads, we should raise alarms.
Either something unexpected is happening, or performance improvement is needed.
Enclosed is a nagios plugin to monitor thread count of a given process.
Original Article: https://dennyzhang.com/nagois_threadcount.
Connect with Denny In LinkedIn Or MailList.
How to find out thread count?
Usually we can easily find the process id. Right?
To dump thread list, we can use pstree easily. If you can't find pstree in your server, simply install with apt-get/yum.
root@denny:~# pstree -A -a -p $pid
java,19943 -Xms16384m -Xmx16384m -Djava...
|-{java},19945
|-{java},19946
|-{java},19947
|-{java},19948
|-{java},19949
|-{java},19950
|-{java},19951
|-{java},19952
|-{java},19953
...
...
...
To list thread count, we can also query /proc filesystem.
root@denny:~# ls /proc/$pid/task | wc -l
176
root@denny:~# ls /proc/$pid/task | head
19923
19943
19945
19946
19947
19948
19949
19950
19951
19952
Wrap up as a Nagios plugin. Check in GitHub.
- The plugin checks /proc filesystem to figure out thread count.
- Raise warnings or critical alerts properly, if necessary
- The output format will comply with nagiosgraph, thus we can see the history from GUI.
More Reading:
- Nagios Plugin: Monitor Service CPU
- Nagios Plugin: Monitor Process FD
- Nagios Plugin: Monitor Service Memory
- Nagios Plugin: Monitor Process Threadcount
Computer Scientist
7 年A more useful approach for nagios type monitoring systems would be to collect all the server data points - cpu core usage, IO bandwidth utilization for disk, networking, processor queue run lengths (running, waiting ) and Memory utilization over a window of time to determine if the server is thrashing and then trigger an alert for that server along with a list of those processes that are high utilization of any of the resources. And save a snapshot of the event for root cause analysis .
Computer Scientist
7 年If I recall correctly. Max thread counts for apache. Tomcat , and jboss are controlled through the configuration files for each. So thread counts are tunable. A more useful alert would be to tie high cpu utilization to the actual processes and or threads that are behind the high usage. Having 3000 Threads IN Use ON A Server That has 20% Cpu Utilization Is Not A Problem Unless Everything Is Choking On IO requests. Having 3000 Threads AND OR PROCESSES Running On A Server WITH 98% Cpu Utilization May Or May Not Be A problem. If the system has no other programs waiting to be run. Then it is not a problem. If the system is thrashing trying to juggle a number of programs competing for cpu. Then it is a problem. Web servers actually benefit from a high thread count. since the connection times are short, a thread can remain unavailable for a short time while the tcp/ip socket protocols are still in motion closing down the virtual circuit. Apache will actually destroy and recreate threads upon first use or keep them for N uses before destroy-create based on configuration file settings. So having spares above expected usage is advised to avoid delays for thread creation.