Outage Report: When the Server Had a Disk-taster
Joseph Bamisaye
The Sorcerer's Apprentice | Software Engineer | Private FX Trader | BEng in Agricultural & Environmental Engineering
It was a dark and stormy night...or actually, it was just a typical night in the life of a server administrator. But this time, things took a turn for the worse. On 2/4/2023 at 9:30 PM UTC, our trusty web server started experiencing issues and caused the website to become unavailable for all users.
But don't worry, we've got the details of what went wrong, and how we fixed it, all neatly packaged in this post-mortem report.
The Problem
The root cause of the outage was a disk space exhaustion on the web server. Our website's database had been gobbling up disk space like a hungry monster, and before we knew it, the server was full. This caused the Apache service to fail and the website to become unavailable.
The Solution
We attacked the problem with a simple yet effective solution: we gave the server a larger plate (i.e. additional disk space). And just like that, the Apache service started up and the website became available to users again.
领英推荐
The Lesson Learned
Here's what we'll do to make sure this doesn't happen again:
The Conclusion
In conclusion, we hope you never have to endure a full-disk disaster like we did. But if you do, just remember to add more disk space and everything will be alright. And now, it's back to our regularly scheduled monitoring and maintenance to keep the servers running smoothly.
Note: The illustration used in this post-mortem is for humor and illustration purposes only and may not accurately reflect technical details. To get the technical details, click here.