登录查看更多内容

Outage Report: When the Server Had a Disk-taster

Joseph Bamisaye

The Sorcerer's Apprentice | Software Engineer | Private FX Trader | BEng in Agricultural & Environmental Engineering

发布日期: 2023年2月5日

It was a dark and stormy night...or actually, it was just a typical night in the life of a server administrator. But this time, things took a turn for the worse. On 2/4/2023 at 9:30 PM UTC, our trusty web server started experiencing issues and caused the website to become unavailable for all users.

But don't worry, we've got the details of what went wrong, and how we fixed it, all neatly packaged in this post-mortem report.

The Problem

The root cause of the outage was a disk space exhaustion on the web server. Our website's database had been gobbling up disk space like a hungry monster, and before we knew it, the server was full. This caused the Apache service to fail and the website to become unavailable.

The Solution

We attacked the problem with a simple yet effective solution: we gave the server a larger plate (i.e. additional disk space). And just like that, the Apache service started up and the website became available to users again.

领英推荐

Turning Off Backups?!

Paul Kerr 2 个月前

HTTP response status codes

SHIHAB HOSSEN RAFAT 5 个月前

Running Out Of Kerosene Can Be Worse Than Running Out…

Bob Losey 4 年前

The Lesson Learned

Here's what we'll do to make sure this doesn't happen again:

Implement a disk utilization monitoring system to keep an eye on disk space and avoid future outages.
Clean up the database regularly to keep disk space utilization under control.
Implement a database backup system to avoid data loss in case of a disk failure.
Monitor the database growth rate and plan for additional disk space before it becomes a problem.

The Conclusion

In conclusion, we hope you never have to endure a full-disk disaster like we did. But if you do, just remember to add more disk space and everything will be alright. And now, it's back to our regularly scheduled monitoring and maintenance to keep the servers running smoothly.

Note: The illustration used in this post-mortem is for humor and illustration purposes only and may not accurately reflect technical details. To get the technical details, click here.

要查看或添加评论，请登录

Joseph Bamisaye的更多文章

??? The Sorcerer's apprentice: Unveiling the Code of Learning ??♂???

2023年12月20日

??? The Sorcerer's apprentice: Unveiling the Code of Learning ??♂???

In the realm of coding, every developer remembers their first encounter with the iconic "Hello, World!" program. It's a…

1 条评论
Critical Outage Postmortem: Web Stack Failure Due to Disk Space Exhaustion

2023年2月5日

Critical Outage Postmortem: Web Stack Failure Due to Disk Space Exhaustion

A critical outage in the web stack occurred on 2/4/2023, 9:30 PM to 2/5/2023, 12:00 AM (UTC), affecting 100% of the…
Illustrating the flow of a request through a load-balanced web architecture with a diagram

2023年1月8日

Illustrating the flow of a request through a load-balanced web architecture with a diagram

+------------------- ???????????????????????????????????????? |…
Tracing the Journey of a Web Request: From Domain Name to Displayed Webpage

2023年1月8日

Tracing the Journey of a Web Request: From Domain Name to Displayed Webpage

When you type https://www.google.

Outage Report: When the Server Had a Disk-taster

Joseph Bamisaye

The Sorcerer's Apprentice | Software Engineer | Private FX Trader | BEng in Agricultural & Environmental Engineering

The Problem

The Solution

领英推荐

The Lesson Learned

The Conclusion

Joseph Bamisaye的更多文章

社区洞察

其他会员也浏览了

Running Out Of Kerosene Can Be Worse Than Running Out Of Gas… Much Worse.

HTTPS with Ambari for Nagios

Restore as a Service - Do it Your self, How to.

How to Configure the NTP Client & Server on IBM i

Unveiling the Truth: The Risky Business of IBMi FTP and How to Secure Your Data

Fetching Client IP in WebLogic Server Behind Load Balancer

How often should I be performing a test restore?

Implementing a Proxy Server in C#: An Example and Test Case

Postmortem based on “500 — Internal Server Error” based on an Apache Web Server Outage Incident

RAID server recovery

The Problem

The Solution

领英推荐

The Lesson Learned

The Conclusion

Joseph Bamisaye的更多文章

??? The Sorcerer's apprentice: Unveiling the Code of Learning ??♂???

Critical Outage Postmortem: Web Stack Failure Due to Disk Space Exhaustion

Illustrating the flow of a request through a load-balanced web architecture with a diagram

Tracing the Journey of a Web Request: From Domain Name to Displayed Webpage

社区洞察

其他会员也浏览了

Running Out Of Kerosene Can Be Worse Than Running Out Of Gas… Much Worse.

HTTPS with Ambari for Nagios

Restore as a Service - Do it Your self, How to.

How to Configure the NTP Client & Server on IBM i

Unveiling the Truth: The Risky Business of IBMi FTP and How to Secure Your Data

Fetching Client IP in WebLogic Server Behind Load Balancer

How often should I be performing a test restore?

Implementing a Proxy Server in C#: An Example and Test Case

Postmortem based on “500 — Internal Server Error” based on an Apache Web Server Outage Incident

RAID server recovery