Redundancy and reliability from Space missions to whistleblowing
The Challenger and Columbia space shuttle disasters
There is a thorough account of the Challenger disaster of 1986 from Richard P. Feynmann in "You're joking Mr. Feynman" which lead to the discovery that the rubber sealings "O-rings" of the tank did not had enough time to expand due to the cold morning and lead to the explosion of the tank.
The story was that Feynman did a famous speech in front of the press with a glass of cold water and the very same rubber that he extracted from the cold water glass to show that it was stiff and that was probably the major cause of the explosion of the space shuttle given also the proofs from the video footages. It was a major example where he was forbidden to talk to the public about but he got the microphone in front of the press and started to speak about it.
Problem was that this was not the only problem ...
There was a generic culture of skipping through problems and not checking faulty components, risk findings or technology limitations. One major examples was the tiles and pieces of the external coverage of the space shuttle which tended to burn and get detached due to the high temperatures , the material of the tiles was maybe the best available in regards to temperature resistance but the assembly of the tiles was not good enough, this was then the major cause of the Columbia disaster of 2003, and yet the death of another 7 astronauts. The spacecraft went up to low orbit, the video footage showed the loss of some tiles and material while going up to space, the risk people asked for an inspection in orbit of the damage which was not executed and on re-entry it the space craft exploded.
The Boeing starliner and SpaceX dragon
Many years ago now, NASA decided to open up a market for low orbit travels to private companies willing to invest in developing a new space launch system for low earth orbit, the goal was to have at least two independent vendors and technologies. At the beginning it was hard to believe that such an endeavor was fitting private companies but the United states had demonstrated already the capacity to open up new markets since the ages of the DARPA grand challenge which lead to the autonomous driving industry. Still remember a speech of Obama saying if we can do something better than the others in United States this is innovation.
In June a few astronauts were brought to the International space station with the Boeing starliner. There has been a problem with the thrusters, and the re-entry flight was rescheduled multiple times for safety concerns.
Few days ago the NASA administration decided that the astronauts will come back to earth using SpaceX Dragon capsule instead of the starliner which is currently docked in space. The starliner is now planned to come down to earth unmanned in September.
Aviation but especially Space is a tough environment to build reliable systems, reliability is achieved with redundancy but not only, typically the redundant systems needs to be built on different technologies and achieve the same results.
The computer system of a plane
A prime well known example for double technology implementing safety are critical computer systems of a plane. There are typically two of them, built on two different software and hardware stacks plus there is the human which shall be able to bypass the electronic systems in case of need. Another famous story yet from the aviation industry was the very crude and reliable technologies used in the original MIG jets, again automatic or human driven hydraulic controls, they used to have no electronics whatsoever cause that may fail.
All these is encoded in aviation safety standards, which starts from how to overhaul an engine and go down to all these minute details. Autonomous systems imply even three times redundant systems so that you can achieve a consensus by majority vote, a major challenge for autonomous cars and drones will be all these safety standards that are yet to be invented.
Testing, testing, testing
I worked for a while in a defense and aerospace contractor, and I discovered that hell ton of software in space is outdated. Why ? because it was created and built and sent to space, then the new rules of testing were put in place, and no one had a full control of what was going on.
The new rules for testing stifled out any new deployment of new software in space and even changes of that software were almost impossible, given the need to implement the new testing rules. All good engineering is about mock testing, component testing, integration testing, end 2 end testing whether this is a piece of software or the full space shuttle.
A plane has 1 million pieces, and an average software product have the same number of lines of code, as we approximate "one line of code" = "one piece" how many ways there are that things can go wrong ? Any good engineer will tell you that if things can go wrong is just a matter of time but they will go wrong.
领英推荐
In the case of the challenger disaster this ended up in old software of the 70s running on a new computer of the 80s, but what if the old software was faulty ? All these "what if" are extremely frustrating we all know that, but is part of the engineering game, things may go wrong.
Accountability and Transparency
Ok now that we tested the space shuttle and somebody knows that we need to improve things, the difficult part kicks in, how much we are dedicated to change things ? what is essential to change, important or can be delayed ? "What if" an operational issue happens and there is a looming risk ?
Mr. Nelson from NASA stated "It is trying to turn around the culture that first led to the loss of Challenger and then led to the loss of Columbia , where obvious mistakes were not being brought forth."
Responsible engineering is exactly this: accountability and transparency: if I sign off for the safety of a bridge, I will go to jail if the bridge collapses, people may die. This is actually a big blow for Boeing, and it will lead to the rethinking of the Reaction Control System managing the thrusters. It was probably a very tough decision for NASA with a lot of political pressure from the multiple parties involved and a lot of impact down the line to follow up due changes.
It's not about the organization pressure to keep things quiet or not to tell the whole story, it's about personal honesty and integrity, and a corporate culture of speaking up and help each other in an organization that you see thoroughly betrayed in the Feynman account of the challenger investigation. Everyone knew that there were limitations in one piece of technology or another but nobody was there or willing to collate the puzzle and speak up that change was needed.
Primates and humans
Change is pain, that's as simple as that, and in our non democratic corporate driven world speaking up for change is often not well perceived. Fear kicks in and obeyance too. Given we have in between behavioral traits of baboons and chimpanzee we are a bit in between hierarchical cultures as from the chimpanzee and totally horizontal ones with hysterical backstabbing behaviors from baboons that we see reflected in our day to day corporate world.
To conclude
As usual reposts, likes and comments are more than welcome, thanks very much for the interest if you arrived reading down to here ;)
To know more
IT Director - COMEX member - P&L Leader of Data and Cloud Platform
2 个月My post about Sirius Space Services Space Service, à French “new space” company of aerospace - https://www.dhirubhai.net/posts/olivierlehe_sirius-space-services-sirius-space-services-activity-7235889114217074688-8vym?utm_source=share&utm_medium=member_ios