Charles Perrow wrote Normal Accidents in 1984. His mitigation strategies apply to any extensive system regardless of type: #Airtraffic, #SaaS, or #SupplyChain.
- Simplify the system. Reduce the number of components and interactions within a system. This makes it easier to understand and control. And thus reduce the likelihood of accidents.
- Decouple the components. Make each component independent to prevent cascading failures. And make it easier to isolate problems.
- Reduce the variety of systems. Reducing the number of ways a system can fail can make it easier to expect and prevent accidents.
- Install redundancy. Add backup systems.
- Improve monitoring and communication. Monitor each component and backup.
- Re-evaluate the system design and risk management processes. This is an ongoing process, not a one-time activity.
These systems are risky, and that perfection is not possible. Don't blame the individuals. Improve the systems to reduce the risk of accidents. Look at the Linux OS for a good design pattern.
Apply these strategies across any complex system to improve #resilience?