Incident reprioritization - a summary of a Facebook conversation
Situation
In December of 2022, there was a conversation on the Back2ITSM Facebook Group centered around a question posed by Thorsten Manthey. At the time, Thorsten mentioned was in a debate about the right course of action regarding the SLA clock when an incident priority changes. He mentioned that there were two camps in this debate; one stating that the clock should continue to run and the other that the clock should restart. What followed was a lively conversation spanning 54 comments where the question was approached from several angles.
I did not take part in the conversation but came upon it several weeks later and decided to make a write up of the whole thing. The conversation was a very friendly one and was very enlightening to read through. I hope to give others the opportunity to also learn from it without having to sift through 6 pages worth of text. I have attempted to accurately summarize the conversation, any misrepresentations and mistakes in the following text are completely my own. There were many contributors to the conversation and many approaches, preferences, and ideas were put forward. In order to improve legibility I have kept the names of the contributors out of the text below. A list of all contributors can be found at the bottom of this summary. For those interested, the entire conversation can be found here.
Reference frame
To jump ahead somewhat to an evaluation of the case; there is no one right answer to this question. There are simply too many variables and points of view to claim that there is a single correct answer. The classic reply “It depends” holds true here, as it does in many cases. People, of course, tend to have preferences and positive experiences with a number of approaches. These preferences and approaches have been shaped by environment variables so it’s important to keep these in mind and have your own conversation with stakeholders should you find yourself in a situation to make a decision on such issues.
Considering the individual user; restarting the clock of an incident they raised comes across as a bureaucratic, not to mention annoying, measure when it moves the deadline back. If the result moves the deadline forward then the user may be pleased or at least neutral, in their reaction.
IT departments, or their managers, may have a different preference. Consider an IT department that consists of several groups that may or may not be outsourced to one or even several external contractors. There may be merit in tracking each group's performance based on their own circle of influence. As they cannot influence how much time has been eaten off the clock when an incident gets routed to them, a clock restart may make sense in this scenario. And because incident reprioritization often leads to rerouting the incident as well (functional escalation when the priority increases, for example) this may be a reason to restart the clock. Abuse of this system is a significant risk though; simply reroute an incident there and back again and it can be solved within the deadline every time. Not to mention that it creates a disconnect between the user experience and what is measured.
Changes in prioritization
A fair amount of contributors paid attention to the act of adjusting the prioritization itself. One group argued that a ticket should never have to be reprioritized; provided the initial investigation is done right. Another group argued that a ticket may very well change priority as new information may arise after the initial intake that could not reasonably be known when the ticket was first raised. There was about an even split between these two groups.
Restart the clock?
The question of restarting the clock did not specify which clock. Different clocks may be running simultaneously. In many cases, a response clock starts at the same time that the resolution clock starts. A single comment was made that the response clock should never be adjusted, after that the focus of the rest of the conversation remained on the solution clock itself.
When a ticket's priority changes; this may have unpleasant consequences to the group working on resolving it. When a ticket gets a higher priority it may automatically be in breached status. The resolver group suddenly has to play catch up. Especially when breached tickets can lead to (financial) penalties, this may be hard to swallow for suppliers. As a personal note, I think that talking about such situations ignores the actual issue. When priority changes lead to breached tickets on such a scale that penalties are incurred then you shouldn’t be talking about the details of the penalty mechanism but working on service improvement.
Most participants in the conversation prefer to keep the clock going. Partly because restarting the clock is seen as obfuscating the actual quality of the services that are delivered. In worst case scenarios, restarting the clock may even be deliberately used to hide breached tickets. A separate clock for every priority a ticket has had over its lifetime was also proposed, though it’s unclear if keeping track of multiple clocks at the same time is a common feature among tooling.
Restarting the clock appears to be a solution proposed to protect contractual interests. Especially when those contracts contain clauses pertaining to service credits or similar ‘carrot and stick’ measures.
Getting into this territory of abusing clock mechanics, service credits and so on begs the question of what is actually being managed. These procedures look to have contractual interests front and center, instead of service quality and customer experience.
Zooming out
Looking at the bigger picture, a number of people took issue with such a focus on the response to service failure. They argue that it’s worth more to define and demonstrate positive value to the business. In the end, the business is interested in the overall user experience more than the exact number of hours and minutes an incident took to resolve. It’s true that clocks are easy to measure but that does not make them the right thing to focus on. Moreover, incident start and end times can be very arbitrary things. Ultimately, a solution clock is an effort to translate business impact into hours and minutes but there are many exceptions to those generalized agreements.
Given enough mutual trust and sufficiently mature organizations, business is better helped with agreements that make clear the level of attention that is committed at a certain business impact. This leads to a far more resilient model when combined with transparent collaboration and genuine resolution efforts. Since the real high impact, high priority incidents don’t usually follow the usual rules of engagement anyway, this approach far better allows for the necessary flexibility in such high profile cases. The approach is completed by initiatives to commoditize basic service catalog activity so that incidents in this sphere can be dealt with using standardized or automated resolution tactics.
Personal opinion
It’s interesting to see how a fairly simple question can lead to a lengthy conversation such as this one. It was nice to see how the entire thing developed through the different reply threads where comments were made. As mentioned near the start, the right way to go about this depends on many variables. My personal preference is to focus on customer experience and to be given the freedom (and confidence!) to tailor incident response to business impact. If such an agreement is not possible from the outset I’d advise to set it as the goal for a continuous improvement program and then to collectively work towards it. Data such as resolution times and the effects and frequency of reprioritization can be useful information. Provided they are used in the right context. Leverage the data to learn from past experience and identify areas of improvement.
Contributors
Akshay Anand
Barclay Rae
Chris Evans
Dave van Herpen
Emilio Ramírez MX
James Finister
James Gander
John Baxter
John Custy
Kevin Holland
Kristi Kursinsky Lawrence
Michael Keeling
Mike Turner
Nadya Milenkova
Paul Edwards
Phyllis Drucker
Rob England
Sophie Danby
Stuart Rance
Thorsten Manthey
The fairest way to measure this, to me, would be that the clock continues if the Incident goes down in priority, but if it goes up in priority the Incident gets cancelled and a new incident at the higher priority gets opened.?Cancelled Incidents are not included in SLA calculations. It seems to be the cleanest and most fair way we have found to report it.?And, if processes are clear and people are trained properly it very rarely happens. If the issue continues to arise then updated processes and/or user training is needed.