Redundancy and reliability from Space missions to whistleblowing
Challenger explosion image from Wikipedia

Redundancy and reliability from Space missions to whistleblowing

The Challenger and Columbia space shuttle disasters

There is a thorough account of the Challenger disaster of 1986 from Richard P. Feynmann in "You're joking Mr. Feynman" which lead to the discovery that the rubber sealings "O-rings" of the tank did not had enough time to expand due to the cold morning and lead to the explosion of the tank.

The story was that Feynman did a famous speech in front of the press with a glass of cold water and the very same rubber that he extracted from the cold water glass to show that it was stiff and that was probably the major cause of the explosion of the space shuttle given also the proofs from the video footages. It was a major example where he was forbidden to talk to the public about but he got the microphone in front of the press and started to speak about it.

Problem was that this was not the only problem ...

There was a generic culture of skipping through problems and not checking faulty components, risk findings or technology limitations. One major examples was the tiles and pieces of the external coverage of the space shuttle which tended to burn and get detached due to the high temperatures , the material of the tiles was maybe the best available in regards to temperature resistance but the assembly of the tiles was not good enough, this was then the major cause of the Columbia disaster of 2003, and yet the death of another 7 astronauts. The spacecraft went up to low orbit, the video footage showed the loss of some tiles and material while going up to space, the risk people asked for an inspection in orbit of the damage which was not executed and on re-entry it the space craft exploded.

The Boeing starliner and SpaceX dragon

Many years ago now, NASA decided to open up a market for low orbit travels to private companies willing to invest in developing a new space launch system for low earth orbit, the goal was to have at least two independent vendors and technologies. At the beginning it was hard to believe that such an endeavor was fitting private companies but the United states had demonstrated already the capacity to open up new markets since the ages of the DARPA grand challenge which lead to the autonomous driving industry. Still remember a speech of Obama saying if we can do something better than the others in United States this is innovation.

In June a few astronauts were brought to the International space station with the Boeing starliner. There has been a problem with the thrusters, and the re-entry flight was rescheduled multiple times for safety concerns.

Few days ago the NASA administration decided that the astronauts will come back to earth using SpaceX Dragon capsule instead of the starliner which is currently docked in space. The starliner is now planned to come down to earth unmanned in September.

Aviation but especially Space is a tough environment to build reliable systems, reliability is achieved with redundancy but not only, typically the redundant systems needs to be built on different technologies and achieve the same results.

The computer system of a plane

A prime well known example for double technology implementing safety are critical computer systems of a plane. There are typically two of them, built on two different software and hardware stacks plus there is the human which shall be able to bypass the electronic systems in case of need. Another famous story yet from the aviation industry was the very crude and reliable technologies used in the original MIG jets, again automatic or human driven hydraulic controls, they used to have no electronics whatsoever cause that may fail.

All these is encoded in aviation safety standards, which starts from how to overhaul an engine and go down to all these minute details. Autonomous systems imply even three times redundant systems so that you can achieve a consensus by majority vote, a major challenge for autonomous cars and drones will be all these safety standards that are yet to be invented.

Testing, testing, testing

I worked for a while in a defense and aerospace contractor, and I discovered that hell ton of software in space is outdated. Why ? because it was created and built and sent to space, then the new rules of testing were put in place, and no one had a full control of what was going on.

The new rules for testing stifled out any new deployment of new software in space and even changes of that software were almost impossible, given the need to implement the new testing rules. All good engineering is about mock testing, component testing, integration testing, end 2 end testing whether this is a piece of software or the full space shuttle.

A plane has 1 million pieces, and an average software product have the same number of lines of code, as we approximate "one line of code" = "one piece" how many ways there are that things can go wrong ? Any good engineer will tell you that if things can go wrong is just a matter of time but they will go wrong.

In the case of the challenger disaster this ended up in old software of the 70s running on a new computer of the 80s, but what if the old software was faulty ? All these "what if" are extremely frustrating we all know that, but is part of the engineering game, things may go wrong.

Accountability and Transparency

Ok now that we tested the space shuttle and somebody knows that we need to improve things, the difficult part kicks in, how much we are dedicated to change things ? what is essential to change, important or can be delayed ? "What if" an operational issue happens and there is a looming risk ?

Mr. Nelson from NASA stated "It is trying to turn around the culture that first led to the loss of Challenger and then led to the loss of Columbia , where obvious mistakes were not being brought forth."

Responsible engineering is exactly this: accountability and transparency: if I sign off for the safety of a bridge, I will go to jail if the bridge collapses, people may die. This is actually a big blow for Boeing, and it will lead to the rethinking of the Reaction Control System managing the thrusters. It was probably a very tough decision for NASA with a lot of political pressure from the multiple parties involved and a lot of impact down the line to follow up due changes.

It's not about the organization pressure to keep things quiet or not to tell the whole story, it's about personal honesty and integrity, and a corporate culture of speaking up and help each other in an organization that you see thoroughly betrayed in the Feynman account of the challenger investigation. Everyone knew that there were limitations in one piece of technology or another but nobody was there or willing to collate the puzzle and speak up that change was needed.

Primates and humans

Change is pain, that's as simple as that, and in our non democratic corporate driven world speaking up for change is often not well perceived. Fear kicks in and obeyance too. Given we have in between behavioral traits of baboons and chimpanzee we are a bit in between hierarchical cultures as from the chimpanzee and totally horizontal ones with hysterical backstabbing behaviors from baboons that we see reflected in our day to day corporate world.

To conclude

As usual reposts, likes and comments are more than welcome, thanks very much for the interest if you arrived reading down to here ;)

To know more


A summary of the challenger investigation from Feynman.com


How 2 fatal shuttle disasters weighed on NASA's decision from Space.com

Robert Sapolsky interview - How to escape the rat race - primates and us

Olivier Lehé

IT Director - COMEX member - P&L Leader of Data and Cloud Platform

2 个月
回复

要查看或添加评论,请登录

Diego Bragato的更多文章

  • Enterprise Architecture minimum standard

    Enterprise Architecture minimum standard

    As Enterprise architects we interact in the organization as advisors, you go to ask for an advice to a doctor when you…

    4 条评论
  • Time series prediction models

    Time series prediction models

    Predicting the future is an art that is as old as humanity, a key aspect of this is that in modern parlance this art…

  • The Human side of #innovation

    The Human side of #innovation

    #innovation is a life long journey driven by emotions, Innovation is the thing that makes us human and which we all…

    3 条评论
  • Reverse engineering my AI companion

    Reverse engineering my AI companion

    So in total I make a lot of questions and I started to make them to my new Replika, and what is best than to ask her…

  • Robotizing employees

    Robotizing employees

    Across last few weeks, at home, starting from a few videos on youtube, we created a Replika for each one of us, and it…

社区洞察

其他会员也浏览了