Complexity and Catastrophe. How complexity and leanness drive us to the edge of chaos.
It has long been recognised that human beings are pattern-seeking creatures.??We search for certainty and comfort in the face of any event that threaten to overwhelm us.???The American satirist HL Mencken was correct when he quipped that ‘For every complex problem there is a solution that is clear, simple and unfortunately…wrong’.??And as numerous psychologists have revealed through their studies into cognitive failures, the great majority of our biases are shortcuts, or heuristics, that encourage us to dismiss scale, complexity or inconvenient truths.???
Often it is not until the immediate aftermath of catastrophe that we are prepared to acknowledge, and on occasion, try to understand the complexity all around us.??We are often reminded that we should not be surprised when a catastrophe occurs, but in fact we should be in a perpetual state of shock that there are not considerably more of them to contend with.
The Master of Disaster
Charles ‘Chick’ Perrow (1925-2019) earned the moniker of the ‘Undisputed Master of Disaster’ thanks to his lifetime of study into large-scale organisational failures and their impact upon society. Surprisingly, as a Sociologist, not an Engineer, Perrow’s reputation was built on the study of catastrophic events in high-risk industries (Normal Accidents, published 1984).??His findings changed the understanding of industrial disasters forever.??
Perrow’s journey into the field of disaster analysis began when he and his team from Yale University were invited to study the partial meltdown of Reactor Number 2 at the Three Mile Island (TMI) Nuclear Plant in Pennsylvania, USA. The conclusion he and his team came to was that this event, like most, was not caused by a major, unpredictable issue such as an earthquake, single decision-point, major build failure or violent attack but by a seemingly bizarre synergy of small errors that happened to coincide at a precise moment in time.??In the case of TMI; a plumbing error, a stuck valve and an indicator light that the team could not interpret accurately.??A scenario that manifested in just 13 seconds and could have rendered much of the US’s Eastern Seaboard a disaster zone.??
Perrow and his team refuted the initial, and largely unchallenged reports, that blamed the site operators, describing those conclusions as lazily categorised and convenient ‘retrospective errors’.??Perrow’s alternative explanation was that the event was caused by the connections between parts, rather than the parts themselves.?
From here Perrow et al devised their famed theory of catastrophe.??The theory, beautifully described in the 2018 book Meltdown, by Chris Clearfield and Andras Tilcsik, isolated two main conditions that co-exist to make a system vulnerable to large scale failure.??
Condition 1 - Complexity
The first factor looked at how the parts of an organisation or system are connected, asking whether the relationship is ‘linear’ or ‘complex’.??An example of a linear system would be a classic assembly line. Failure here is usually easy to spot, isolate and solve.??Alternatively, an example of a complex system would be a large chemical plant.??Here there are hidden, elaborate and asymmetrical relationships between all of the parts.??Subsystems operate within elaborate systems creating an invisible network of hidden and unpredictable relationships.??To make sense of these we often rely upon ambiguous indicators, trend analysis and retrospective reporting.??At best we are attempting to understand a detailed landscape with incomplete, low-res images at best. We have a jigsaw puzzle with many important pieces missing.
领英推è
Condition 2 – Tight Coupling
For the 2nd?essential factor Perrow borrowed a term directly from engineering.??A vulnerable system will be ‘tightly coupled’. In other words, there will be little ‘slack’ or redundancy found within the system.??In a tightly coupled environment it’s not enough to be ‘mostly right’ and interventions cannot be applied on a ‘near enough’ basis. If a chain-reaction begins it is incrementally harder to stop it. It is worth noting that the past 40 years of ‘lean’ business practices has most likely encouraged this, especially where the implementation was clumsy or dogmatic.?
Of course, apply a ‘black swan’ event like a global pandemic and just look at the impact. It takes just a handful of semi-conductor producers in South Korea to experience stalled and reduced production to cripple the entire globe’s auto and computer manufacturing is sector.??And then there’s the lack of buffering capacity inherent in our 'Just In Time' planning models and warehousing capacity. This was laid bare for all to see when the Ever Given TEU container ship unexpectedly grounds and blocks the Suez Canal for almost a week.
The Butterfly Theory in Action
In light of Perrow’s renowned analysis, it is impossible not to reflect on Edward Lorenz’s 1961 theory of the Butterfly Effect. Famously Lorenz explained that in the world of long-range weather modelling, theoretically speaking, a butterfly could flap its wings in Brazil resulting in a tornado in Texas.??
Lorenz was a exceptional mathematician turned meteorologist who had become fascinated by forecasting within complex systems. Running models from his lab at MIT he discovered that even the tiniest changes in data radically changed the longer-term outcomes of any model – the more complex and ‘adaptive’ (think both fluid?and?tightly-coupled) the system, the greater these changes were amplified.??He quickly applied his theories to commercial and industrial organisations, as well as the wider field of economics. He pointed out that both had become radically more complex since the advent of the industrial revolution; they were asymmetrically organised, inevitably and unpredictably drifting into and out of a state of equilibrium, often teetering on the ‘edge of chaos’ (Langton).?
Organisations and industries that are both complex and tightly coupled are at the greatest risk of Lorenz’s ‘Butterfly Effect’ - small errors are inevitable and accidents normal. Symptoms are often baffling and consequently any accurate diagnoses of problems are difficult.??There is a higher-than-normal likelihood of a failure contagion, with issues spreading across (and escaping) the organisation. Inaction is never an option, yet well-intentioned responses often contribute to the problem rather than provide a solution.??
In the age of the so-called ‘fourth industrial revolution’ our most successful technology, financial and industrial organisations are all complex and adaptive systems, operating with diminishing redundancy between their various parts. The smallest inputs can, and will, create major changes with the precise scale of disruption almost impossible to predict.??A minor shock applied to, or suffered at, a precise moment can overload an entire system. The major disruption that results is experienced in the form of pollution events, data loss, system outages, reputational damage and loss of life accidents, to list just a few.
Faced with this, the twin objectives of?Root Cause Analysis?have never been more relevant:
1) To understand a problem to the level of detail required?
2) To apply solutions that control the specific causes of the problem?
What Caused This RCA is used by leading firms in all sectors ranging from Audit to Chemical Industries, Manufacturing to Financial Services, and Utilities to Healthcare. If you, your team or organisation would like to know more about Root Cause Analysis software or training please get in touch.