Let's Merge Incident and Problem Management
Jo Peacock
Director of Transformational Change and Governance, ITIL Ambassador, ITSM / PMO, ITIL, SIAM, Prince2, Agile, GRC (Risk), PROSCI
I’ve been involved with ITIL? for nearly 20 years with my time in OGC / HMT and decades of consultancy, training, writing, and blogging. And all this time I’ve preached on the benefits of both Incident and Problem Management as practices and how they should be kept (and managed) separately. These 2 practices are at the heart of efficient and effective service management and customer support and they serve 2 very distinct purposes.
But now I’m asking the question: is ITIL? wrong? Or rather, is the advice to manage the practices separately wrong?
It’s true that, for most organizations, the default recommendation of keeping these practices separate is still the best as their objectives, timelines, and workflows are fundamentally different. However, ITIL? is designed to be flexible isn’t it? There's an increasing demand to combine the practices, so I’ve been using the flexibility of the framework to do exactly that; and with great results.
What is Incident Management?
At its core, Incident Management is about one thing: restoring normal service operation as quickly as possible. It’s reactive, an event has happened that’s impacted someone in a negative way and Incident Management is designed to minimise disruption to users and business operations by getting things back to “business as usual” (whatever that may look like) as quickly as possible. It’s the sticky tape that holds things together that is never meant to be permanent.
Incident Management as the IT organization’s emergency response team. When something breaks or it’s running slow then Incident Management’s job is to put out the fire as efficiently as possible.
I use a number of examples when I’m teaching, but let’s use my pothole example. Imagine if you will than you’re walking out of your office (or any office for that matter) looking at your phone. We all do it. You don’t see the pothole in the pavement and you tip, breaking your leg. Ouch!
Incident Management will:
In other words, restore "service" to you getting you up and walking again as quickly as possible, with the immediate disruption addressed.
The focus for Incident Management is on urgency and restoring functionality, not on understanding or fixing the root cause of the problem.
What is Problem Management?
By contrast, Problem Management focuses on a long-term, and sometimes proactive, view. Its purpose is to prevent further incidents from happening by identifying and resolving the underlying causes of disruptions. The root cause.
Returning to the pothole:
Problem Management will:
Problem Management is about identifying and analysing the root cause, then finding solutions to resolve the underlying causes. Unlike Incident Management, which works in the present and with a sense of urgency, Problem Management often operates on a slower timeline (when was the last time you tried to get a pothole repaired??), prioritising long-term improvements over immediate results.
Why Are These Practices Typically Separate?
Let’s face it. you wouldn’t want the roadwork crew attending to your broken leg, and you also wouldn’t want the doctor fixing the pothole. The objectives, workflows, and skills needed for Incident and Problem Management are fundamentally different, which is why ITIL? treats them as distinct disciplines:
Different Goals:
Different Timelines:
Different Skillsets:
By keeping these disciplines separate, organisations ensure that urgent issues don’t get delayed by lengthy investigations, while systemic problems receive the attention they deserve.
So When Should We Combine Them?
Whilst keeping Incident and Problem Management separate is often the best practice, there are situations where combining them both is needed:
In small IT teams, resources are often stretched thin. Maintaining separate workflows and skills for incidents and problems can lead to inefficiencies, duplicated effort, and skills shortages. In these environments, combining the practices ensures that both the immediate and long-term needs of the customer are addressed without unnecessary complexity.
When 1st implementing a structured ITSM framework organizations will start invariably select Incident Management as their starting point. This is a natural starting point, but you can’t ignore long-term, permanent fixes in favor of short-term work-arounds. Without any Problem Management activities then issues will recurr frequently and so what is implemented isn’t true Incident Management, but a merger of Incident and Problem Management practices.
I expand on proactive and reactive Problem Management and the specialist skills needed during training sessions and also on my Youtube channel if you need further explanation of the practice, but when incidents occur frequently due to the same underlying issue, proactive Problem Management isn’t always the 1st practice to identify the frequency of these incidents. Invariably Incident Management and the Service Desk are the 1st to identify frequent, and widespread, service interruptions. A combined practice can accelerate both resolution and root cause identification. For example, if a specific group of users frequently experience the dreaded “blue screen of death” then the Service Desk can identify the common attributes amongst the users which forms part of the root cause analysis. That same Service Desk team can also investigate the root cause to implement a fix.
Some organizations aim to streamline their workflows, eliminating handoffs and reducing bureaucracy. Combining incident and problem management allows them to work more efficiently, particularly in fast-moving environments like startups.
Most ITSM tools support the integration of incident and problem records, allowing for seamless tracking and collaboration. This makes it easier to combine practices without losing visibility into either and is especially important when 1st implementing an ITSM framework.
What a Combined Practice Looks Like
If you choose to combine Incident and Problem Management, it could look like this:
1: Centralized Logging
2: Immediate Response
3: Root Cause Investigation (Parallel Workstreams)
4: Preventative Measures
5: Knowledge Sharing
6: Continual Improvement
Potential Challenges
Combining Incident and Problem Management practices isn’t without risks:
It’s therefore important that clear a governance structure is in place, with defined policies, processes, skills, roles, and tools to address these challenges and support efficient and effective issue management. It should also be noted that the same distinction between an Incident and a Problem must be maintained even when the practices are combined.
Final Thoughts
Incident and Problem Management are distinct disciplines for good reason, but there are times when a combined approach can bring value—particularly for smaller teams or high-volume environments. By carefully designing a framework that balances immediate resolution with long-term prevention, you can achieve the best of both worlds.
Jo Peacock is a visionary leader in IT governance and organizational change, empowering teams through strategic innovation and best-practice guidance.
Jo Peacock
919 308 0634
Author, IT and Business | "Critical Thinking, Limitless Possibilities." Global Program Manager @ BT | Cross-Functional Team Leadership| ITSM Expert| #ITSMForBusiness #ITSMKMBusiness #ITSM4AI #humaninfluenceintechnology
2 个月Incidents and Problems are not the same. I’ve seen organization’s fail miserably trying to merge into one process for efficiency reasons. The result - SLA achievement decreased by 27% in the first 30 days. Not wanting to give up, one client invested one year thinking it was a temporary dip and the benefits will be seen in the long term. Nope, SLA performance didn’t improve nor did the performance of the new Incident and problem consolidated process. My client reverted back to two separate processes integrated to work together and within 3 months SLA performance back on track performing well.
IT Support Manager, MIET
2 个月The fun part is when you do it without even realizing it! Sometimes we get so invested in the how, why, and what that, by the end of the day, we look back and see we’ve seamlessly combined two critical practices, and got the job done. Flexibility doesn’t mean fragility! So why not?
Award Winning ITSM Consultant | ITIL Author | ITIL 4 Master | Trainer
2 个月The only consideration in a shared approach is that under stress the behaviours and characteristics of incident management will inevitably dominate and PM gets less focus. This of course can be anticipated and managed. A suggestion to ensure balance and ensure strong PM focus is to set a top level team KPI that drives reward/recognition for the combined team based on ‘prevention and reduction of incidents’ this shows that success of the team is not biased and skewed towards incident resolution.
?? ITIL?4 Master, Managing Professional, Practice Manager, & Strategic Leader ?????? ITSM Coach, Consultant, & Trainer ?? Husband, Father, Papa, Brother
2 个月P.S. Much discussion, including my comments are on IM and PM from a reactive perspective, but let us not forget the value of proactive PM, especially when assisted using the Monitoring&Event practice! At least one person should be designated for this work in a domain that all agree is most critical and valuable to the organization.
?? ITIL?4 Master, Managing Professional, Practice Manager, & Strategic Leader ?????? ITSM Coach, Consultant, & Trainer ?? Husband, Father, Papa, Brother
2 个月Then plz color me a purist too. ?? Before #ITIL was 'invented', in managing small to large IT support operations; for high volume, multi-category [multiple hw & sw domains], & multi-support levels we kept them separate. We even had two KE db's, one proprietary for 'public' consumption. In small IT startup scenarios, it makes sense to initially mingle the two practices, but the vision should be to mature the capability of each so they are highly integrated but separate practices. The timeline of each practice is very different, with IM needing to be as short and problem using work-arounds to buy time. The most common time both practices would be running concurrently/in parallel with each other; would be when a newly discovered incident is registered. IM would trigger the opening of a new problem record & auto-link the new incident record to it for review and by PM to decide what action if any, to take. Another common scenario would be when IM creates a new work-around, for any incident to meet IM's purpose, then also trigger the PM practice to officially 'bless' the WA as valid for use in other occurrences &/or improve the WA. Eventually, all incidents would be auto-linked to existing P/KE records. #ITSM #Incident #Problem