POSTMORTEM

POSTMORTEM

Before we begin, We would like to clarify, the issues we were facing have been solved. There is no dead website out there with our name on it. It revived. So Let's call this a Revival Summary.

Summary

The issue was a 500 Internal Error for the website we created. The problem started on April 26, 2022 at 06:00 AM EAT. The root cause was a spelling error which happened when our team was making adjustments to the Document root on our Apache webserver. Due to this error our website was offline for more than 5 hours and it affected all of our users. Issue was resolved on April 26, 2022 at 11:32 AM EAT.

Timeline

  • The issue was detected when customer contacted our support center on April 26, 2022 at 08:00 AM EAT. They told our staff the website has been down for half an hour. He noticed 500 internal Error, He kept refreshing but it wasn't working.
  • Our support center contacted the main team at 08:20 AM EAT. We didn't have to go and search for too long. We knew the changes that were made that morning and where the issue could possibly be.
  • We started combing through the new adjustments we made and we found the spelling error on one of the files that has been modified and the website was up and running on the same day at 11:32 AM EAT.

Root Cause

The root cause was a spelling error. On our wp-settings.php file, class-wp-locale.php was written as class-wp-locale.phpp. All we had to do afterwards was remove the redundant 'p'.

Corrective and Preventative Measures

This was a simple mistake but it was impactful on us and our clients. Having a failed website for 5 hours because of such a simple issue shouldn't happen again. As a preventive measure, we should all try to proof read our work. We understand spelling mistakes happen very often but they are also very difficult to find when you are searching for them. So better to do proof reading before deployment. We have also discussed it, and we decided that we should not only proof read our own work, but also work of our team as well, because errors can be detected better by a fresh set of eyes.

We could also try to get a monitoring tool. The issue would have been detected when this customer started noticing the issues and not 30 minutes later. There might have been other customers who might have tried it earlier but gave up on it or didn't know how to contact our support team. Monitoring tool could have brought the attention to us sooner.

要查看或添加评论,请登录

Samra Barnabas的更多文章

社区洞察

其他会员也浏览了