System failures cause accidents. So how can systems be made more fail safe?
Adrian Thompson
Snr Safety Leader | Complex systems thinking enabler and PhD chaser
I recently wrote an article with a statement that was both admired and admonished by readers, researchers and advocates of the same cause. The statement was: "Calling it [failure] human error helps nothing, calling it a system failure beckons for action and provides hope for sustainable and impactful change". I personally believe a mentality and philosophy which bespeaks and bears the scent of a modern safety science agenda must not only support this message but invest in and apply it.
This article deconstructs and displays in 4 parts a snippet of my motive in the hopes to well, if not rally support for, at least share insights into the well researched and applied answer to the Human Error fallacy that has plagued accident investigations since they begun - to build systems that when fail, fail safely..
Systems come in all sorts and sizes, if we're talking about an isolation permitting sub-system that is to manage the risks involved, well even this sub-system must be managed as an element of a greater continually improving system. This is because however specific any system may seem, recognising its interdependency on other systems is critical as this is where systems fail and where failsafes will be effective. So I'll focus a little on building failsafes into an organisations Safety Health Management System that is relied on to manage its more intricate and applied sub-systems.
Firstly, we need to define what we expect of any system - asking the question of whether the system is designed to fail safely seems like the easy answer but the problem is failure, like systems, is just not that linear - it is complex. To state the obvious and hopefully demystify an often nebulous concept; the systems design and intent will understand how the system fails if it fails, therefore enabling us to fail safely when we design the system with this in mind - no one system is the panacea for all failure but when its couplings are well understood we can demonstrate a certain level of resilience in how our systems and greater organisations react to it. A common error of an organisation is forgetting to apply simple Systems Theory when designing any system for a particular intent.
NB. Whilst theoretical concepts are always over-ever-as reliable as how we imagine them to be in the abstract construct often referred to as "systems" - they're the closest thing to reliable we have.
Being a systems nerd and having studied, applied and worked heavily with them in everything from Safety to Leadership to Quality to Process & Production, I can't help but to keep this traditional and remind everybody of W. Edwards Deming's work. Deming produced a lot of great work but his simplified Plan Do Check (or Study, being a modern adaptation) Act process, which has been adopted by numerous ISO's and legislative documents world wide, insists we recognise that continuous improvement by design is essential for continued system success.
I recently read the nonsensical and dangerous statement that "without failure, you'd never know what to improve"... I'll let that sink in... Imagine if that was your company's motto? "Just keep doing what you're doing until that kills you, then we'll make some changes for your replacement until that fails too and so on". It is accurately stated that we must learn when we fail. But to build fail safes ONLY after failure would be inhumane not to mention costly to all. We must design and build in elements of resilience that ensures when systems are failing it is recognised early and remedied with effective mitigative action and management of change - the popular Chronic Unease concept speaks true to this.
Secondly, organisations too often design and rely solely upon mechanisms of a system that merely identify and report on failure. Lag indicator statistical modelling of success (TRIFR etc.) is the most common constraint in modern safety systems. Designing a system that ignores or fails to recognise success is inadequate - sure one could decry that it does recognise success: successful lack of injuries. Then I must simply ask how were there no injuries? How does your mechanism explore this? What are we doing well? What must we continue to do? Driving an organisations safety decision making based off lag-centric reporting is like walking backwards hoping what you see behind you will inform you of what's to come. If we're to fail safely, then we must better identify areas of performance variability (successful and unsuccessful) to predict success and failure, not just react to it.
There is much theory supporting the spearheading of lead indicators into safety systems but I'd like to share just one piece of it that I find interesting. If your organisations safety system is lag-centric (designed to manage safety by measuring statistics of failure only) and it has a Total Recordable Injury Frequency Rate (TRIFR) of 10 (which is average-ish) then technically you'll have to work 100,000 man hours on average before you can identify and rectify unsafe conditions or processes, learn anything and begin to fail safely... Scary.
Thirdly, we too often fail to recognise complexity in our organisations, or even more often misdiagnosing it as merely complicated and supposedly manageable. For those unfamiliar with the difference, the key difference between complexity and complicatedness is complexity creates situations of unknowability where a multitude of internal and external, nonlinear and emerging, converging or diverging factors and elements of a system can affect a successful or unsuccessful outcome. Whilst complicatedness describes the existence of intrinsic and inherent difficulties. However, typically, said difficulties are foreseeable and knowable therefore predictable, measurable and manageable. Complex environments however can result in a variety of emergent and unforeseen and unpredictable situations all from the same initial conditions, factors, actions or inactions.
Complexity creates uncertain results and conditions requiring multiple layers of hyper-dynamic systems and require a different style of leadership to identify and control complexity risk, fail safely and improvement continually.
Whilst I believe simple fail safes are subjectively easy to build into systems, systems, in all socio-technical organisations change and evolve. Meaning, for an organisation to successfully manage fail safe systems it must design them mindfully in a way that continually improves based on what is known to be successful whilst being one that identifies and manages the risk of complexity and weak signals of failure.
Lastly, all the above requires something quite unique to drive and support it: a new type of leadership - Adaptive Complexity Leadership. I am honoured to quote a beautifully written section of work by Mary Uhl-Bien & Michael Arena in 2016 to exemplify this:
[Leaders need to] 'apply complexity thinking, where leaders learn to read a system and watch for signs of emergence … those who can apply it know how to use pressures, conflicting, linking up, and timing to anticipate, interact with, and channel emergence’
This statement is a testament to work modern leadership science is producing and how it excitingly reflects exactly where safety science is also heading. This is how all Leaders must apply themselves to prevail over the aforementioned factors and constraints described.
I know it's a slight segway but for more on Adaptive Leadership and it's safety connotations please leave a comment and we'll discuss it. For anything else, please also comment and we'll chat.
Cheers - AT.
Keywords: #safety #leadership #systems #business #success
Creating Value in Major Projects. Engage * Empower * Lead.
5 年Hi Adrian Whilst completing another project in an industry suffering H&S systems chronic fatigue syndrome awoken intermittently with Human factors, BBS, Drones and March bands. I thought I would crudely respond(huge shout out to iPhone technology). To cap your position; 1. Establish a systems objective 2. Systems performance metrics typically failure based. 3. Systems fail to recognise organisations complexity 4. Systems to be led and maintained by leadership(Adaptive Complexity Leadership Sound position in isolation, and quite an articulate article. I do hold a potentially simplistic perspective on a Systems objective. From systems I’ve been exposed to over the years I hold the following position. H&S systems are typically; 1. Based on administrative legal compliance 2. Decades old, recycled and modified shamelessly 3. Do not adequately create visible leadership & effective workforce engagement 4. Risk frameworks do not reflect Risk Scope because; - Risk frameworks not implemented - rarely followed from organisational risk assessment workshop through to task level risk assessment; - they do not adequately detail leadership activities that demonstrate strong company values and, due diligence in managing risk profile.
Driving Organisational Learning to Improve Safety @ the DLR??
6 年"This statement is a testament to work modern leadership science is producing and how it excitingly reflects exactly where safety science is also heading." I'm so glad that someone else is seeing this emerging trend. For me, the most exciting thing about cutting-edge safety science is that its findings aren't just relevant to the safety world, they are more about how modern organisations, communities and cultures can be led and managed in better ways. The fields of Complexity, Systems Thinking, Resilience Engineering and Safety-II (learning from what went well) are just some examples of this.
INDUSTRIAL SAFETY SPECIALIST
6 年Hi Adrian. Insightful and valid comments
We're all learning!
6 年Adrian, your thoughtful piece is very helpful in teasing out the problem with simplifying the attribution of "accidents" as (conveniently?)? human error causing "perfect" systems to "fail". As you say this is more a complex system "failure"? (as it must include us, the humans that operate them). But system "malfunction" is probably a more accurate description as "failure" suggests binary system states (Black and white ?): whereas in real life most intended inter-dependencies between functions in these complex systems are naturally "variable" and most systems work successfully if sub-optimally (as the Deming - Juran Quality circles demonstrated). FRAM, similarly, by concentrating on the behaviour of these functions, allows us to explore and understand these?"shades of grey", that we all know is the reality. This also allows teams and leaders to become aware of emerging issues, which as you say is what is needed - "intelligent adaptation" of the way we operate the system, not how can we stop these dumb humans fouling it up? I agree with you that this is the approach that can add?the layer of resilience which you are seeking?
Knows a thing or two in Automotive & Finance | Results3 | Views expressed are my own | Footnote ?1? ZB = Zero Bologna
6 年Well, this is interesting, let me try to spell out what I understand your suggestions are to see if I got most of it: ? progressively add / build in layers of improvement to systems of well-identified limitations handled as such (safe base incrementalism) ? focus on leading indicators (including weak signals) ? optimize towards least consequence - in fact, safe condition - in case of malfunction, false detection, deteriorated mode, etc. (easier said than done) ? avoid unnecessary complication in the way we manage things (somewhat similar to applying Ockham's razor), understand the implications of complexity out there – things that we can neither control nor fully understand, but for which we should have a best way to react to – while not taking the shortcut of undue / illegitimate simplification (oversimplification) Now the real challenge is to translate and solidify this approach / philosophy into concrete actions.