Normal Accidents:  Lessons from FAA and SouthwestAir outages in 2023
From Gerona.ca article

Normal Accidents: Lessons from FAA and SouthwestAir outages in 2023

Charles Perrow wrote Normal Accidents in 1984. His mitigation strategies apply to any extensive system regardless of type: #Airtraffic, #SaaS, or #SupplyChain.

  • Simplify the system. Reduce the number of components and interactions within a system. This makes it easier to understand and control. And thus reduce the likelihood of accidents.
  • Decouple the components. Make each component independent to prevent cascading failures. And make it easier to isolate problems.
  • Reduce the variety of systems. Reducing the number of ways a system can fail can make it easier to expect and prevent accidents.
  • Install redundancy. Add backup systems.
  • Improve monitoring and communication. Monitor each component and backup.
  • Re-evaluate the system design and risk management processes. This is an ongoing process, not a one-time activity.

These systems are risky, and that perfection is not possible. Don't blame the individuals. Improve the systems to reduce the risk of accidents. Look at the Linux OS for a good design pattern.

Apply these strategies across any complex system to improve #resilience?

要查看或添加评论,请登录

社区洞察

其他会员也浏览了