Introducing chkmount: An Effective Tool for Diagnosing Hanging or Unresponsive Filesystems
As an administrator, I often receive calls regarding hanging servers or unresponsive filesystems. Whenever I encounter such issues, I typically run the command "df -h" to check the disk usage, but sometimes this command itself hangs. Through experience, I've discovered that the problem is often related to a hanging or disconnected remote mount point, such as CIFS or NFS. However, when a server has multiple remote mounts, identifying the specific problematic mount point can be challenging.
To address this problem, I developed a tool called chkmount a few years ago, which I believe could be beneficial to others facing similar situations. Allow me to explain how it works.
chkmount operates by scanning the /proc/mounts file to locate NFS and CIFS mount points. Subsequently, it initiates a thread that executes a "df -h" command on each of these mount points. In case the thread exceeds a certain time limit, it raises an error and provides a list of the hung file systems.
For your convenience, here are the compile instructions for chkmount:
gcc -lpthread -o chkmount chkmount.c
Feel free to give it a try and let me know if you find it useful!
I will add this to my arsenal. I've done this by hand for many years.Thanks for such a valuable tool as hung mounts have been annoying to diagnose at the very least.Now to turn this into a nagios plugin! ??