How KCS came to the rescue for a production outage... on a weekend!
"Support required for production outage" is not the subject header of an email I look forward to reading on any day of the week.?It's even less welcome on a Monday morning for a customer email that had been sent 48 hours earlier on a Saturday.?This particular customer did not have a 24x7 support entitlement which would have given them access to a senior product engineer on a weekend.?Several colleagues had received the email and a P1 ticket had been raised, but because the customer contract was for business hours only no one had responded to the email thread.?Severe disruptions to the customer business, but in the email and in the ticket, there were very few technical details which would help with the investigation.??
However, there was a surprisingly positive ending to the episode.?The first email was followed by a second later that same Saturday: corrective actions had been taken and service had been restored as the administrator had found a KB article which outlined what needed to be done.?On Monday there was a follow-up discussion to see what hotfixes and preventative measures might be available, but this discussion took place during normal business hours and via email.?The customer relationship was and continues to be a positive one.
This happened recently for the product I support, but could happen as easily at another company.?If the customer can find the right KB article, with the symptoms and the resolution clearly given, self-service can and does happen... even on weekends!??
As support engineers doing technical work and having busy daily schedules, it often feels like there is never a good time to switch gears and write a KB article.?There is always another urgent issue to tackle, call to join, or a ticket which has been open for too long and needs attention.?But as a support engineer, I also benefit when my colleagues who have gone through the process of solving a complex issue generalize the incident and record for future reference what happened, what were the symptoms and what was the resolution.??
Sometimes an incident goes on for weeks, months, or even a year.?The corrective action to restore service if an administrator knows what to do is much shorter.?And when the engineer who has gone through the process of helping the first customer that goes through that extended period writes that KB article, and the article is updated when other engineers reference it, everyone benefits including the customer.?On a weekend.