How Complex System Fail 3

How Complex System Fail 3

Reviewing the work of Richard Cook https://web.mit.edu/2.75/resources/random/How%20Complex%20Systems%20Fail.pdf and applying it to satellite communications

Catastrophe requires multiple failures single point failures are not enough..
The array of defenses works. System operations are generally successful. Overt catastrophic failure occurs when small, apparently innocuous failures join to create opportunity for a systemic accident. Each of these small failures is necessary to cause catastrophe but only the combination is sufficient to permit failure. Put another way, there are many more failure opportunities than overt system accidents. Most initial failure trajectories are blocked by designed system safety components. Trajectories that reach the operational level are mostly blocked, usually by practitioners.

I view number three here as building off of number two. If you have a properly engineered system then it will have backup systems that will be able to handle single point failures and maybe even multiple point failures. In the end it always comes to a breaking point where the odds stack up against the system and it has a catastrophic failure.

One of these that sticks out in my mind was when we had an uninterrupted power supply catch fire, melted and dropped power to our transmit amplifiers.

So what happened?

Once again multiple failures that seemed unlikely, but with enough time did eventually happen.

This catastrophic failure was in a satellite terminal in Balad, Iraq. The shelter was getting older with age and starting to show it. Part of the issues here is that the corner of the shelter started to separate, creating a hole. Normally with the crazy hot weather in Iraq this wasn't an issue, but then we hit rain season. Management had been notified for months about the maintenance required and the resources required to complete the maintenance. I think the management was just waiting till their tour ended and hoped it wouldn't be an issue and could push it off for the next guy.

On a gloomy Iraq day it started to rain, down pour rain, and it started to get into the shelter through the hole. The water dripped along the wall until it hit the TWTA exhaust vents, pooled up on them and started to drip again. This time it dripped on the uninterrupted power supplies. The water pooled up and soon went into the uninterrupted power supplies and started a fire. The uninterrupted power supplies melted can cut all power going to the TWTA.

The outage alarmed on our monitor and control system. When the technician went to the shelter he was able to get the fire and and reroute power. The outage lasted 45 mins. The management team was able to get the previously asked for resources to fix the hole before the outage was resolved. By the end of the day the hole was fixed and this never happened again.

What's your favorite incident of all the pieces aligning to create a catastrophic failure?





That has been my experience / observations in the aviation world as well...rarely are catastrophic aviation incidents the result of a single point of failure...it's many small failures that align to cause a bigger issue. The "technical" term we used was: "The holes in the Swiss-cheese lined up" (i.e. the holes by themselves don't cause an issue, but when they come together in the right way, you have a problem.

要查看或添加评论,请登录

Dave Crogan的更多文章

  • Monitor & Control User Experience Design

    Monitor & Control User Experience Design

    Monitor and control systems are pretty awesome at making our jobs easier by putting access to equipment at our finger…

  • Implementations to not be a Brent

    Implementations to not be a Brent

    I'm refeering to Brent in the book The Phoenix Project. If you've read it, are you Brent, do you know a Brent? If you…

  • Agile LEAN TOC in SATCOM

    Agile LEAN TOC in SATCOM

    I read a lot of books dealing with Agile, LEAN, and TOC mostly with manufacturing and software development. They're…

    1 条评论
  • How Complex System Fail 4

    How Complex System Fail 4

    Reviewing the work of Richard Cook https://web.mit.

  • How Complex System Fail 2

    How Complex System Fail 2

    Reviewing the work of Richard Cook https://web.mit.

  • How Complex System Fail 1

    How Complex System Fail 1

    Reviewing the work of Richard Cook https://web.mit.

  • Real Life - 5 Whys to happier customer and employees... and money savings

    Real Life - 5 Whys to happier customer and employees... and money savings

    I recently went on a Carnival cruise which had some great pros, mmmmmm make your own burger bar mmmmmm, and it had some…

  • Chevy 2014 Volt Review after 3yr lease

    Chevy 2014 Volt Review after 3yr lease

    I leased a Chevy volt for 3 years and here is my pro and con list in no particular order with the items that mattered…

  • Telecommunication Maintenance: Red Tag - White Tag

    Telecommunication Maintenance: Red Tag - White Tag

    Most people think of maintenance as a defensive reactive process to fix an issue back to working order. While this is…

    3 条评论
  • Book Review: Eccentric Orbits

    Book Review: Eccentric Orbits

    Being in the satellite communications industry and having used Iridium phones while in the military I was interested to…

    1 条评论

社区洞察

其他会员也浏览了