Another lesson on High Reliability Organisations - and why they are failing to manage themselves correctly

You get to a certain stage in life, and after a while you just stop reading the newspapers. All you get is the same stories coming around time after time, with little if any evidence that the people who are meant to know what they are doing have any idea of what it is that they are supposed to be doing. Whether it is the desire to turn our national energy security over to Chinese government agencies, the return of the casino culture that led to the global financial collapse, or just the England football team once again failing to live up to their super-star (and super-star salary) status, you either shout at the newspaper, which upsets the people at the next table in the coffee shop, or start to bypass the news and go straight  to the crossword.

The story in yesterday’s The Times ‘Whisteblower exposes nuclear safety fears at Sellafield’   proves once again that nothing is ever learned, and we are seemingly doomed to repeat never-ending cycles of fail – recover – forget lessons – fail again.

The story (this time – I am sure there are others both at Sellafield and at every other similar critical national infrastructure facility), concerns the lack of effective staffing levels, and the use of plastic bottles to store highly radioactive material. According to the report, more than two thousand bottles, which were originally intended for temporary storage, have started to degrade. The response from the Sellafield authorities? ‘Sellafield later said that plutonium and uranium samples were “kept securely”, and that ‘to imply that such material is inappropriately managed is simply not true”’.

Much of my recent work has been around the concept of High Reliability Organisations, those organisations whose functions are critical, and for whom operational failure – or even disruption – could have catastrophic consequences. They are organisations that to quote one source, ‘are not only foolproof, but damned foolproof’.

The fundamental starting point for the development of an HRO is the overarching culture within which the organisation operates, in which the possibility of failure is simply not allowed, and within which every decision and action is measured against the overall objective, which is to become a zero-fail operation. Examples of HRO’s would include nuclear power stations, aircraft carriers, national air traffic management systems, and oil rigs. As can be immediately seen, the common factor uniting all of these sectors is their operational criticality allied to an extremely high level of technical complexity.

For anyone involved in or responsible for the development of high level risk management programmes, the study of HRO’s, and particularly HRO’s that suffer major failure, is a major source of insight into what can go wrong, and why. Examples of HRO’s that have spectacularly failed include NASA and Space Shuttles Challenger and Columbia, BP and the Deepwater Horizon oil spill in the Gulf of Mexico, and the failures of the Three Mile Island nuclear reactor in the US and the Daiichi reactor run by Tepco in Fukushima, Japan, that failed as a result of the tsunami of 2011. And yet, what also unites all of these examples is that the root cause of the catastrophic failures, which had massive impacts both on the companies themselves and the environments in which they were operating, was not the failure of technical components, but the long-term erosion of the basic management culture that was supposed to underpin their HRO operations. In simple terms, they broke the first three rules of any business operation - they got stupid, lazy and greedy.

Reading the article on Sellafield, I was reminded of the language used in the report following the explosion at the Buncefield fuel depot in the UK (2005), which was the fuel storage area for Heathrow airport, and resulted in the largest explosion in Western Europe since the end of the Second World War.  

 As well as some technical issues that led to the failure of the primary system and the secondary security management programmes, the root cause of the event was basic management failures, all of which should have been core issues for everyone involved in the management of the company, the project and the operation at every stage and every level of its delivery.

From the Executive Summary:

This report does not identify any new learning about major accident prevention. Rather it serves to reinforce some important process safety management principles that have been known for some time:

There should be a clear understanding of major accident risks and the safety critical equipment and systems designed to control them.             This understanding should exist within organisations from the senior management down to the shop floor, and it needs to exist between all organisations involved in supplying, installing, maintaining and operating these controls.

There should be systems and a culture in place to detect signals of failure in safety critical equipment and to respond to them quickly and effectively. In this case, there were clear signs that the equipment was not fit for purpose but no one questioned why, or what should be done about it other than ensure a series of temporary fixes.

Time and resources for process safety should be made available.           The pressures on staff and managers should be understood and managed so that they have the capacity to apply procedures and systems essential for safe operation.

Once all the above are in place:

There should be effective auditing systems in place which test the quality of management systems and ensure that these systems are actually being used on the ground and are effective.

At the core of managing a major hazard business should be clear and positive process safety leadership with board-level involvement and competence to ensure that major hazard risks are being properly managed’.  

No-one can claim that these are esoteric issues that it would be unfair to expect managers to be aware of. In fact, they are precisely the issues that are associated with any HRO.

Zero-Failure Organisations: Preoccupied with failures rather than success

 An Understanding of Complexity: Do not feel the need to simplify interpretations

 A Sensitivity to Operations: Understand their challenges and demands

 A Commitment to Resilience: A determination that they would not fail

 Deference to Expertise: Fluid decision-making based on deep understanding of problems, rather than decision-making based on hierarchy and job descriptions.

Recognising the value of ‘near-misses’: Using near-misses to identify potential problems, and to rectify them as a matter of immediate urgency

 At the core of the Buncefield disaster, as it was at BP, NASA, and Tepco, and as it will undoubtedly be at Sellafield if the same attitude as shown by its spokesperson continues, was a wilful ignoring of the clear warning signs that something was wrong.

 The foundational value for HRO’s is a Commitment to ‘Truth’, a disciplined and ruthless self-criticism, and a rejection of anything that could be seen as actual or potential mis-specification, mis-estimation and/or misunderstanding of things.

 When I see people responsible for nuclear reactors claiming that everything is alright, when it clearly isn’t – then it’s time for another espresso, and back to the crossword!

 For an excellent academic paper on this subject, see

 Weick, K. E., Sutcliffe, K. M., & Obstfeld, D. (2008). Organizing for high reliability: Processes of collective mindfulness. Crisis management3, 81-123. Available here

Buncefield Report (‘Why Did It happen?’) is here

A two-part series on HRO’s in Risk UK Journal can be found in the December 2015 edition (pp 50-51) and January 2016 edition (pp 49-50)

要查看或添加评论,请登录

Dr David Rubens的更多文章

社区洞察

其他会员也浏览了