登录查看更多内容

Modern IT Ops, Incident Mgmt Workflow based on complexity requires focus on only 3 disciplines

Mario Schlangenotto

IT Executive, CIO-Level Leader and astute business driver, delivering exceptional customer experience

发布日期: 2021年11月1日

What if …? … Issues are fixed automatically without any negative impact on the IT consumer? … IT Operations is able to solve issues quickly without an often too lengthy Tier Support Model? I would call this a fantastic Modern IT Operations!

To make this dream come true, a combination of different approaches may be necessary. The following describes a high-level concept based on Shift-Left and Swarming. It uses Cynefin to classify different categories of work based on the level of complexity.

In short, what is Shift-Left, Swarming and Cynefin about?

Shift-Left: Basically, the aim is to ensure that the work that is currently being done by experts can be done by less skilled people or even through automation. So you “shift the work” from experts to non-experts and automation. The great advantage is that the experts can focus on what matters (improving the usage experience and product quality), while the issue of the IT consumer can be avoided or solved automatically without negative impact.

Swarming: Defines an approach in which one person selects a topic, takes end-2-end responsibility, and leads it to a solution by using the experience of technical experts directly if necessary. It’s a collaborative approach that encourages co-solutioning and requires a disciplined way of working, possibly with an agile mindset. Therefore, it also helps to break down IT silos. In our context, the advantage of this approach is that the incident is resolved as quickly as possible.

Cynefin: Describes 5 different levels of complexity called domains (Obvious, Complicated, Complex, Chaotic and Disorder) and explains how to approach them. It is a conceptual framework that was originally designed to aid decision making.

All this theory is fine, but let’s combine it to optimize the Incident Management Workflow with the aim of avoiding or resolving issues quickly and reducing the Service Support workload.

领英推荐

The Benefits of AI-Powered ITSM for Hospital Incident…

Clovity 1 个月前

What Makes Opsgenie the Ideal Tool for Tailored…

Clovity 2 个月前

How Does an AIOps Platform Solution Improve IT…

Jade Mckinley 3 个月前

Have you noticed that it only takes 3 disciplines to achieve a Modern IT Operations Incident Management Workflow? Automation, Knowledge and Swarming. Shift-Left is to be considered as a general core principle. (Please note that I am not encouraging you to implement all 3 disciples at once. Focus on the most urgent one first, according to your context and needs.)

Speaking of automation, there are two different areas that we need to consider. First, search for ways to proactively avoid issues having a negative impact. This often requires AI and machine learning capabilities that can search for patterns based on detected events and are enabled to define an automatic correction. But hey, start small: an e.g. automatic execution of the operating system patch deployment avoids also negative effects. The second area is called self-healing. This means that in the event of an error, it is automatically detected and fixed (cleaning profiles, free disk, CPU or RAM space, … start small).

Speaking of Knowledge, this should already be one of the core disciplines of IT. However, knowledge must be made available to various stakeholders (Self-Help articles for IT consumers and KBAs for the Service Desk) in a language that they can understand and execute accordingly. The most difficult thing is likely to improve the findability of relevant knowledge articles. Remember, it will only help if the right article is found quickly that describes in a simple and actionable manner how the incident can be resolved.

Speaking of Swarming, the biggest problem is scaling. It sounds great that one person takes up the incident and ends up being responsible for fixing it. However, if you have too many incidents at the same time, you will not be able to achieve the goal of resolving the incidents quickly with limited resources. One trick is to assign the work based on the level of complexity. Incidents where the cause-and-effect-relationship is clear and well documented, should be shifted to self-help or resolved directly by the Service Desk (obvious work). All other incidents are recorded in a backlog. Swarms can take up work directly from this backlog. There are many different types of swarms described in various articles on the internet, but I will only focus on the following three.

A Backlog Swarm is basically a group of experienced, knowledgeable persons who each take on different incidents and try to resolve them as quickly as possible. Sometimes this person needs help from a technical expert who he/she can pull in directly (make sure that technical experts have enough time). While a Backlog Swarm focuses on complicated issues, a Dispatch Swarm takes complex incidents. Dispatch Swarms meet frequently to review work that has not yet been completed. A Swarm Leader is appointed and has access to several different technical cross-functional experts. The last type of Swarm I want to briefly describe is a Drop-in Swarm. This Swarm reviews the backlog frequently and “drops-in” if incidents with a high complexity are discovered or based on requested by the Product Owner (e.g. in case of Major Outages). A Swarm Leader is appointed, usually a Support Analyst who has access to Sub-Swarms of domain technical experts.

A Swarm always strives to “shift work left”. Once they have resolved the issue, we need to think about how to avoid the same issue happening in the future or which additional knowledge is required. Of course, swarming cannot replace the Tier Support Model overnight. But you can start small, e.g. with a Backlog Swarm and scale it up and/or introduce different swarms types once you can demonstrate success (so don’t forget to think about how you measure and define success).

One final, even independent, note: you should focus on reducing waste frequently. Proper Automation, Knowledge Management and Swarming requires first getting rid of complicated processes, IT silo thinking and outdated knowledge or poorly written articles.

Michael Gerlach

Head of End User Experience

3 年

Interesting read...Thank You!

Emmanuelle O'Donovan

Customer Experience | Relationship Management | Business Development | Change Management

3 年

Really interesting read. It'd be interesting to have view resources / budget should be allocated in avoidance vs. occurred

查看更多评论

要查看或添加评论，请登录

Mario Schlangenotto的更多文章

Synergy Unleashed: Human & AI Redefining the Future

2024年10月20日

Synergy Unleashed: Human & AI Redefining the Future

In today's rapidly evolving technological landscape, the interplay between humans and artificial intelligence (AI) is…

1 条评论
Business Value of IT

2024年3月10日

Business Value of IT

..
Engage & Deliver: Commitment to deliver valuable IT Enterprise Products

2021年9月8日

Engage & Deliver: Commitment to deliver valuable IT Enterprise Products

Today, everybody should enjoy using IT! Why is this so difficult to achieve? Here is the truth and an explanation of…
Impact on core IT products with consideration of CX/UX

2021年3月27日

Impact on core IT products with consideration of CX/UX

Speaking of Consumer / User Experience (CX/UX), what does this mean for core IT Services in an enterprise environment?…

4 条评论

Modern IT Ops, Incident Mgmt Workflow based on complexity requires focus on only 3 disciplines

Mario Schlangenotto

IT Executive, CIO-Level Leader and astute business driver, delivering exceptional customer experience

领英推荐

Mario Schlangenotto的更多文章

社区洞察

其他会员也浏览了

Revolutionize Incident Management with ServiceXpert: AI-Powered, Seamless, Efficient

Transform Incident Management with ServiceXpert: AI-Driven Automation for Real-Time Resolutions

Software Production Incident Management

AI Tools in IT Support: How Automation is Transforming Incident Management and Boosting Productivity

ESTIM Software's Incident Management Dashboard: Streamlining Incident Resolution and Enhancing Customer Satisfaction

How Automated Incident Management can help Organizations Realign their Internal Processes with Strategic Objectives

Negotiating Business Expectations - Practical Guide to Enterprise Incident Management (Part 2)

2. AIOps and the Automation of Incident Management: Pros, Cons, and Common Mistakes

AI in Service Management (Pt 2) - Incident Management

Boosting Incident Management using ITSM and integrated IT tooling

领英推荐

Mario Schlangenotto的更多文章

Synergy Unleashed: Human & AI Redefining the Future

Business Value of IT

Engage & Deliver: Commitment to deliver valuable IT Enterprise Products

Impact on core IT products with consideration of CX/UX

社区洞察

其他会员也浏览了

Revolutionize Incident Management with ServiceXpert: AI-Powered, Seamless, Efficient

Transform Incident Management with ServiceXpert: AI-Driven Automation for Real-Time Resolutions

Software Production Incident Management

AI Tools in IT Support: How Automation is Transforming Incident Management and Boosting Productivity

ESTIM Software's Incident Management Dashboard: Streamlining Incident Resolution and Enhancing Customer Satisfaction

How Automated Incident Management can help Organizations Realign their Internal Processes with Strategic Objectives

Negotiating Business Expectations - Practical Guide to Enterprise Incident Management (Part 2)

2. AIOps and the Automation of Incident Management: Pros, Cons, and Common Mistakes

AI in Service Management (Pt 2) - Incident Management

Boosting Incident Management using ITSM and integrated IT tooling