登录查看更多内容

Building a "root cause" mindset

Jay Alphey

Senior leader and coach experienced in scaling technology organisations.

发布日期: 2025年1月21日

In a fast-moving or scaling environment, our processes cannot be static. Learning and continuous improvement must be at the heart of how we work, how we design our processes and how leaders work with teams. We need to build scalability into how we work.

Learning becomes a key skill. Creating a learning mindset is important. There are key steps which we can take as leaders to help

We must ensure teams have the right environment and culture with the psychological safety which allows them to ask questions.
There must be learning opportunities such as retrospectives to give the time and opportunity to think about what we have learned.
We must approach failure as a learning experience and consider how best to move forwards

Culture is important, and the role of leader as coach to build this is critical. But this goes hand-in-hand with learning good techniques for learning. For example, retrospectives are a key activity, but doing good retrospectives is a skill which needs to be taught and practiced.

In this Play, we look at one of the key skills needed to make learning successful - Root Cause Analysis.

Responding to symptoms

Ask your team about something which has gone wrong. What is their immediate response? They will tell you what happened. What went wrong. In general this will be an observed symptom. It is very easy to react directly and try and address that symptom.

Let's look at an example scenario with four players.

Steve is the leader with overall accountability for the product's delivery
Jon is the engineering lead and his team is coding the product
Mary is the team's quality engineer and responsible for testing the code after it is completed
Ranjit is responsible for managing the code within the deployed environment

Steve is a directive manager who prides himself in keeping tight control of his team. Mary has raised to Steve a concern that handovers from Jon often seem to be much later than predicted. She is finding it difficult to get the code tested in time to have it released to Ranjit on the planned schedule.

How is Steve going to respond?

There is clearly an inefficiency in the system that needs to be looked at.? A na?ve response would be that if a symptom is observed, we must push back directly against the measured parameter.? Steve's immediate assumption is that this is a failure on Jon's part. If Jon’s delivery is late, Jon must be at fault and must make the deliveries earlier.

As a directive manager following Scientific Management approaches, Steve assumes that Jon must be under-skilled or under-motivated. Since he sees motivation as extrinsic (Theory X), Steve will aim to punish Jon. And since he sees work as predictable and reductionist, Steve will step in and take control, probably initiating a performance management plan (PIP) for Jon.

Read: Scientific Management - https://agileplays.co.uk/why-should-we-move-away-from-scientific-management/

Read: Reductionism and planning - https://agileplays.co.uk/agile-for-pms-are-there-any-plans/

Looking for causes

Let's step back a little and replay the scenario with a different leader.

Anne is a more cautious leader who is data-led and collaborative. She is aware that Mary's observation is valid, but is a symptom of the problem, not the problem itself. She cannot address the situation until she understands the underlying problem which causes the issue observed by Mary.

The purpose of Root Cause Analysis is to look beyond the symptoms and try and assess what the underlying (or "root") causes of the problem may be. If we address the cause, we are far more likely to deal with the problem than if we try and fix the immediate symptoms.

So let's look beyond that first symptom. We know that Jon's deliveries to Mary are late. The next step is to ask "why?". Mary only knows the symptom, so Anne needs to ask why the deliveries are late. It's time to call a retrospective to look at this problem with the whole team.

Note that we need a certain level of psychological safety to do this as a group exercise. It will work much better that way, but if there is a blame culture, Anne may need to talk to Jon individually first and collect people's opinions in private. Fortunately Anne has spent a while working on building a positive culture.

Why are Jon's deliveries late? Jon says that he is spending a lot of time on working on bug fixes which Mary and Ranjit have sent him.
Why are bug fixes taking so much time? There is a high volume of defects being found in testing or on the deployed system.
Why are Mary and Ranjit finding so many defects? It is clear that feature quality is poor and this is being discovered very late.
Why is the quality of new features poor? Jon is struggling with the underlying level of technical debt in the codebase which is making development slow and prone to error.
Why is technical debt high? New features are being requested at an increasing rate and prioritised over addressing the underlying problems.

By probing into the symptom and asking "why" questions, Anne is now discovering underlying problems which she can address. Indeed it appears that Anne's own prioritisation of new features may be the underlying cause of the delays that Mary is seeing!

Read: Psychological safety - https://agileplays.co.uk/what-do-we-mean-by-psychological-safety/

Read: Why do retrospectives fail? - https://agileplays.co.uk/are-your-retrospectives-failing/

The "5 Whys" approach

In the example above, I was using a popular approach from Lean called "5 Whys". This involves repeatedly asking the question "Why" to look at the underlying cause of the issue identified in the last step. Each time we probe deeper into the problem, looking for a "root cause".

领英推荐

July 2023 Community Newsletter

Lean Enterprise Academy 1 年前

eAlert

WatSPEED at the University of Waterloo 8 个月前

Fail Forward: Learn from Mistakes

K.C. Barr 10 个月前

Problem: Code deliveries are late
Why?? Jon is spending his time on rework
Why?? Features are rejected by Mary and Ranjit
Why?? Feature quality is poor (possible technical root cause)
Why?? Increasing technical debt in codebase
Why?? New features are pushed in and prioritised (possible systemic root cause)

As we see, after questioning five times, we are reaching a real understanding on which we can act.

There is nothing "magical" about the number five. It emphasises that the approach is not simple and that it is necessary to keep pushing to get beyond symptoms to problems. Often we find that three "why" questions might get a technical explanation (here identifying feature quality) but that more probing is needed to understand a systemic root cause - what part of the process has caused the issue.

"5 whys" is a very effective technique and one which is relatively simple to understand and apply. It does, however, need some practice. Asking "why" repeatedly can seem artificial and disruptive. Personally I've found that once you get over the slightly stilted repetition, it can be a hugely effective way to find out what are the underlying root causes.

Often "5 whys" can take you down a path which discovers something unexpected and a root cause which is not obvious from the symptoms. The example above shows this, but a classic example was an examination of why the Lincoln Monument in Washington was deteriorating.

Problem: The Lincoln Monument is deteriorating
Why??Chemicals are being used to clean the monument
Why??The monument is covered in pigeon droppings
Why??Pigeons are attracted by the large number of spiders at the monument
Why??Spiders are attracted by the large number of midges at the monument
Why??Midges are attracted by the fact that the monument is first to be lit at night.
Solution:?Turn on the lights one hour later.

In this example (which is often quoted but is probably apocryphal) the solution to damaged stonework proves to be to adjust the lighting to attract less dusk-flying midges. This shows some of the possible power of the technique to propose solutions which are very different from directly responding to the symptoms.

Read: Effective retrospectives - https://agileplays.co.uk/effective-retrospectives-in-agile-development/

Read: Risks of local optimisation - https://agileplays.co.uk/the-risks-of-local-optimisation/

Read: Waste in Lean software - https://agileplays.co.uk/what-is-waste-muda-in-lean/

Extending "5 whys"

Although the "5 whys" approach is often advertised as the sole answer to root cause analysis, it does have limitations which are inherent in its simplicity. In our example, we identified that the root cause of the problem of delays was new features being prioritised. But we didn't identify that having Mary test the code after development isn't a good practice, and that quality could be improved by better develop/test integration rather than only testing at the end.

Unlike a tree, there is not a single root and there may be multiple underlying causes. Lean uses a technique called an Ishikawa diagram (also known as a "fishbone diagram" because of its shape). This is a tree-like structure, with a single outcome but multiple paths to root causes. The left hand part is split into categories and each individual root cause is "hung" from one of these categories.

The Ishikawa diagram is more complex to draw and requires more extensive analysis, but it emphasises how multiple factors may play into a particular incident. Standardised categories may also make it easier to identify root causes. I'd recommend an approach like this for a formal analysis after an incident, but "5 whys" makes a great starting point for assessing a situation.

To look at why multiple root causes may be important, consider this analysis from a hospital (from "Sensemaking of patient safety risks and hazards" - Battles et al). An incident with wrong medication being given to a patient is assessed as being due to poor product design on the patient wristband.

Incident: Wrong patient medication error
Why??Wristband not checked
Why??Wristband missing
Why??Wristband printer on the unit was broken
Why??Label jam
Why??Poor product design

As the authors point out, this has correctly identified a factor in the problem but there could be multiple causes forming a complex tree as below. Many of these branches have a root in organisational culture, which is probably a greater factor than the wristband design. By just focussing on the first identified root cause, we may not be addressing the most important.

Good practices

As an Agile leader, you want to avoid making rapid decisions based on observed symptoms. This is the traditional "go with the gut" management, and can often lead you astray. Instead, you should endeavour to be analytical and to look beyond the immediate symptoms of an issue to find the underlying root causes.

For this to be effective, you will need to develop a culture of psychological safety which allows the teams to discuss issues openly and to understand the causes without blame or defensiveness. You will also need to work on your own skills to ensure that you develop the patience to find underlying problems rather than leaping at immediate solutions.

A great starting point is the "5 whys" technique which probes into the problem to find an underlying root cause. With some practice you can use this regularly as part of your normal approach to understand what the cause is behind symptoms which you observe.

A limitation in "5 whys" is the focus on a single root cause. For more important cases you may need to develop more formal process, such as using Ishikawa diagrams to identify multiple root causes and judge which you need to address.

Remember of course that identification is not enough. As with a retrospective, you need to plan improvement activity to address the underlying issue and make sure it is resourced and tracked to prevent the issue recurring.

要查看或添加评论，请登录

Jay Alphey的更多文章

How much team stability is good?

2025年2月25日

How much team stability is good?

As a leader, your choices about how you set up teams impacts how you can manage work. By making decisions on how teams…
Lean, software waste and bananas

2025年2月18日

Lean, software waste and bananas

We all know that "waste" is bad. We use the term "waste" for all sorts of really bad things.
Agile in Three Dimensions

2025年2月6日

Agile in Three Dimensions

It feels like social media feeds are full of "Agile is dead" articles. However, it seems everyone has different views…

6 条评论
Predictability isn't free

2025年2月4日

Predictability isn't free

At a past organisation I had been working across Engineering on improving flow and we were seeing significant…
Schr?dinger's cat and software productivity

2025年1月30日

Schr?dinger's cat and software productivity

A CEO wants to understand the value she is getting from the team whose salaries she is paying. A VP wishes to reward…
Why local optimisation fails and what to do about it (part 2)

2025年1月28日

Why local optimisation fails and what to do about it (part 2)

When faced with problems it is easy to rush into making changes in your own team to respond. It is always easier to…
Why local optimisation fails and what to do about it (part 1)

2025年1月23日

Why local optimisation fails and what to do about it (part 1)

At a startup communication is easy and work flows efficiently. Everyone knows what is most important and gets on with…
Leading with questions

2025年1月16日

Leading with questions

In a traditional organisation, the role of the leader is to know all of the answers for the teams. In a stable…

6 条评论
Rethinking "failure"

2025年1月14日

Rethinking "failure"

Let's face it, "failure" has a bad name. We don't like to fail.
Are your retrospectives failing? (part 2)

2025年1月9日

Are your retrospectives failing? (part 2)

In Part 1 I looked at some background to what we are trying to achieve from retrospectives and the idea of…

See all articles

Building a "root cause" mindset

Jay Alphey

Senior leader and coach experienced in scaling technology organisations.

Responding to symptoms

Looking for causes

The "5 Whys" approach

领英推荐

Extending "5 whys"

Good practices

Jay Alphey的更多文章

社区洞察

其他会员也浏览了

Swap Sterile Learning for Doing - Help Your Teams Today

Letting Go of What’s Always Worked: Your Agility Bridge to Tomorrow

Context Based Philosophy

Thinking Skills frameworks

CRITICAL THINKING SERIES NO. 32 - CRITICAL THINKING EMBRACES FAILURE AS LEARNING -

OD76: ?? Maps: Conditions for Behavior Change

Some People Don't Want to Learn

Navigating Perception Management and Cognitive Learning: A Guide for Team Players

Imperatives of a Great Strategy Vol.9

Embracing praise-worthy failure

Responding to symptoms

Looking for causes

The "5 Whys" approach

领英推荐

Extending "5 whys"

Good practices

Jay Alphey的更多文章

How much team stability is good?

Lean, software waste and bananas

Agile in Three Dimensions

Predictability isn't free

Schr?dinger's cat and software productivity

Why local optimisation fails and what to do about it (part 2)

Why local optimisation fails and what to do about it (part 1)

Leading with questions

Rethinking "failure"

Are your retrospectives failing? (part 2)

社区洞察

其他会员也浏览了

Swap Sterile Learning for Doing - Help Your Teams Today

Letting Go of What’s Always Worked: Your Agility Bridge to Tomorrow

Context Based Philosophy

Thinking Skills frameworks

CRITICAL THINKING SERIES NO. 32 - CRITICAL THINKING EMBRACES FAILURE AS LEARNING -

OD76: ?? Maps: Conditions for Behavior Change

Some People Don't Want to Learn

Navigating Perception Management and Cognitive Learning: A Guide for Team Players

Imperatives of a Great Strategy Vol.9

Embracing praise-worthy failure