Nagios Plugin: Monitor thread count of your service

If programs start too many threads, we should raise alarms.

Either something unexpected is happening, or performance improvement is needed.

Enclosed is a nagios plugin to monitor thread count of a given process.



Original Article: https://dennyzhang.com/nagois_threadcount.

Connect with Denny In LinkedIn Or MailList.



How to find out thread count?

Usually we can easily find the process id. Right?

To dump thread list, we can use pstree easily. If you can't find pstree in your server, simply install with apt-get/yum.

root@denny:~# pstree -A -a -p $pid
java,19943 -Xms16384m -Xmx16384m -Djava...
  |-{java},19945
  |-{java},19946
  |-{java},19947
  |-{java},19948
  |-{java},19949
  |-{java},19950
  |-{java},19951
  |-{java},19952
  |-{java},19953
  ...
  ...
  ...

To list thread count, we can also query /proc filesystem.

root@denny:~# ls /proc/$pid/task | wc -l
176
root@denny:~# ls /proc/$pid/task | head
19923
19943
19945
19946
19947
19948
19949
19950
19951
19952

Wrap up as a Nagios plugin. Check in GitHub.

  • The plugin checks /proc filesystem to figure out thread count.
  • Raise warnings or critical alerts properly, if necessary
  • The output format will comply with nagiosgraph, thus we can see the history from GUI.

More Reading:



Kenneth Goodwin

Computer Scientist

7 年

A more useful approach for nagios type monitoring systems would be to collect all the server data points - cpu core usage, IO bandwidth utilization for disk, networking, processor queue run lengths (running, waiting ) and Memory utilization over a window of time to determine if the server is thrashing and then trigger an alert for that server along with a list of those processes that are high utilization of any of the resources. And save a snapshot of the event for root cause analysis .

回复
Kenneth Goodwin

Computer Scientist

7 年

If I recall correctly. Max thread counts for apache. Tomcat , and jboss are controlled through the configuration files for each. So thread counts are tunable. A more useful alert would be to tie high cpu utilization to the actual processes and or threads that are behind the high usage. Having 3000 Threads IN Use ON A Server That has 20% Cpu Utilization Is Not A Problem Unless Everything Is Choking On IO requests. Having 3000 Threads AND OR PROCESSES Running On A Server WITH 98% Cpu Utilization May Or May Not Be A problem. If the system has no other programs waiting to be run. Then it is not a problem. If the system is thrashing trying to juggle a number of programs competing for cpu. Then it is a problem. Web servers actually benefit from a high thread count. since the connection times are short, a thread can remain unavailable for a short time while the tcp/ip socket protocols are still in motion closing down the virtual circuit. Apache will actually destroy and recreate threads upon first use or keep them for N uses before destroy-create based on configuration file settings. So having spares above expected usage is advised to avoid delays for thread creation.

回复

要查看或添加评论,请登录

Denny Z.的更多文章

  • 4 Challenges In Kubernetes Log Transport

    4 Challenges In Kubernetes Log Transport

    For the past three months, I have been working on PKS observability features. Right now, it’s mostly about kubernetes…

    3 条评论
  • Examine Unexpected Changes In Your /etc/hosts File

    Examine Unexpected Changes In Your /etc/hosts File

    Updating hosts file is super easy! Any sed, echo, vim command will work. You're perfectly safe, if all changes only…

    11 条评论
  • Use Jenkins To Run Remote SSH Commands

    Use Jenkins To Run Remote SSH Commands

    Occasionally I need to run some ssh commands on multiple servers. Sometimes sequentially, sometimes parallelly.

    15 条评论
  • 5 Tips Of GUI Tests With Python + Selenium

    5 Tips Of GUI Tests With Python + Selenium

    I have been using Python + Selenium for years. Honestly speaking, I'm far from a frontend expert or a QA expert.

    22 条评论
  • Get Alerts, When Containers Run Into Issues

    Get Alerts, When Containers Run Into Issues

    I'm running docker containers for all side projects. Usually one single container.

    11 条评论
  • Cheap VPS: Try Linode For Your Side Projects

    Cheap VPS: Try Linode For Your Side Projects

    DigitalOcean is inexpensive to AWS EC2. Surprisingly Linode is even 30%-40% cheaper than Digtialocean.

    7 条评论
  • [Container] Run Process Debug Tools, But Install Nothing

    [Container] Run Process Debug Tools, But Install Nothing

    Ever need to debug your process in containers? Use strace, lsof, pstree, or anything you name it. But after login, you…

    5 条评论
  • Free And Temporary VPN For China

    Free And Temporary VPN For China

    If you are in China or temporarily visit China, it's hard to open webistes like Google, Gmail, Youtube. Even…

    5 条评论
  • Effectively Technical Writing In GitHub

    Effectively Technical Writing In GitHub

    Delivering short and precise documents quickly is a key asset for DevOps. Nowdays, hosting code in GitHub is not only…

  • Monitor Outbound Traffic In Deployment

    Monitor Outbound Traffic In Deployment

    Deployment process may explicitly or implicitly run commands like apt-get, wget, etc. It's quite natural and common.

社区洞察

其他会员也浏览了