The Peril of Programming Airplanes
Cliff Berg
Co-Founder and Managing Partner, Agile 2 Academy; Executive level Agile and DevOps advisor and consultant; Lead author of Agile 2: The Next Iteration of Agile
Airplanes are very safe. Even so-called “fly by wire” airplanes are - not because the technology is inherently safe - far from it - but because of triple redundancy.
However, not all systems are triple redundant. Why is this redundancy needed?
For pre-digital technology, the need for redundancy was obvious. The original Learjet had a triple redundant hydraulics system, so that if one sprang a leak, there were two others to take over. But why do computers need to be redundant?
For one, a computer can suffer a physical failure just as a hydraulic line can: a circuit board can burn out. For another, a computer can have a kind of failure that a pre-digital system cannot have: it can encounter a logic failure - a programming bug.
A programming bug is a situation that the programmers did not anticipate. How often does this happen? Industry norms are that programs have about one bug per 100-200 lines of code. Microsoft Windows and Mac OS X each contain hundreds of millions of lines of code. The flight control software in an airplane is complex and contains millions if not hundreds of millions of lines of code.
And it is controlling all of the critical systems of the airplane you ride in, 40,000 feet up in the air, at 600mph.
That is why triple redundancy is needed - not just to protect against a hardware failure, but so that if one processor executes a software bug and locks up, there are two other processors - running independently written software.
This sounds scary, but it has been made safe through lots of testing. While everyone who owns a smartphone has had it lock up on them, requiring a reboot, smartphone operating systems are generally more reliable than the individual apps that run on them, because the vendors - Apple and Google - test the heck out of those operating systems. It is not that they build the software any differently - they don’t - they just run lots of tests to find the problems. They don’t do that for all their products: I had a first generation Apple TV, and that thing was one of the must buggy devices I have ever had.
You might wonder why it is so hard to write software without bugs. It shouldn’t be. The reasons are two-fold. One is that programming is not treated as an engineering discipline. Rather, it is treated as “coding”, which has been compared to writing a novel. There is little design discipline today in the software industry - it has given way to the desire for rapid releases that kind of work and that get refined over time. The entire culture of the industry favors rapid production of new features over having things be rock solid. Netscape was a pioneer in that approach, when in the mid-90s they started releasing un-ready versions of their browser, effectively using the public as their testers. That became an industry norm, and infused the culture of how things are done.
The second reason that bug-free programming is so hard is that core technology choices made by the industry make it really difficult to write reliable (or secure) software. Simply put, operating systems and programming languages are based on paradigms that are are not scalable to large systems. To write large systems today, we have to go to unreasonable lengths to fix up the errors. Better choices would have made it possible to eliminate the errors “by design”. In other words, today’s technology foundations are inherently brittle and error-prone.
It is not too late to reverse this, but to do so would require leadership. Tim Berners-Lee is making an effort to undo some of the mess that he created with his World Wide Web protocols (no trust framework, and typeless protocols - amateur mistakes!). However, that is like a single voice in a hurricane. Fixing things would require a level of effort and consensus that does not exist. There are calls to make things more secure and more reliable, but no one seems to understand the root causes of the problem.
One of the foundational tools of information technology is the set of programming languages. Today’s popular programming languages do not lend themselves to creating reliable programs. Indeed, the most popular one - Javascript (which is unrelated to Java) - is the least robust for reliable programming. Javascript enables programmers to write a-lot of code quickly; productivity is how programmers are measured, so it is no surprise that they favor tools that give them productivity at the expense of reliability. Managers who oversee programmers do not have a clue about these issues, and they generally allow the programmers to choose which languages and tools they prefer. The inmates are running the asylum.
Even those languages that encourage reliable programming have significant flaws. There are quite a few such languages, but most are not widely used. I myself am adding another to the mix.
But fixing the situation will require a national awareness that does not exist. In the US we continue to elect people to office who have little or no technology background, despite the fact that nine out of ten of the most pressing issues today have deep technology roots: jobs, climate change, intellectual property, education, transportation, defense, you name it - today, technology is a large factor. Germany elected a nuclear physicist to lead them, we elect lawyers and wheeler-dealers.
Instrumentation Technical Expert at United States Air Force
6 年First, while I’ll defer to experts in the avionics field, safety critical systems don’t have millions to hundreds of millions of lines of code. The rate of one bug per 100-200 lines of code would be completely unacceptable in the aerospace industry. Triple redundancy is to address multiple issues, including failures of sensors, failures of hardware, or software failures due to hardware effects such as a single event upset from an atmospheric neutron. I’ll defer to my colleagues on airliners, but on the military or research aircraft I’ve worked on, the avionics were all using the same hardware and software. To provide triple redundancy against a software error or hardware bug in a CPU you’d need three different CPU types from different vendors and three different software teams with the associated test and integration labs. Also wow the character limit is low, I'll thread this for part 2.
J2 Aircraft Dynamics - j2 Universal Tool-Kit - Flight Sciences Analytical Software Suite
6 年I think talking about aerospace software and standards in the same article that includes JavaScript and saying it is the least robust is unfair to those who code avionics and fcs.? The rules/guidelines/testing processes for airborne software systems are worlds apart from those used in phone apps. ? The whole article comes out at a time when two aircraft have crashed, possibly due to software issues and this article attempts to compare it to JavaScript coders. Personally I think it is misleading and I would like to understand more about the authors experience in writing airborne software and safety critical systems upon which he is commenting.
Engineering Forefather
6 年The basic problem is that although society doesn't understand science or the processes of science, it happy (and naively) uses its products. Yet we ask these same people to make decisions relating to science. Our general success leads to complacency; every bridge that didn't fall down when you used it, every car that didn't breakdown, every app that didn't crash ... builds naive belief in everything scientific. Scientific understanding is only a model, and as Box says "All models are wrong but some are useful". Science has limits as do humans; Not everything is possible; Not all bugs will be minor. Software is not excluded from this. The aphorism? "Software can fix anything" also has a flaw... that "fix" has different meanings. My take away ... Don't confuse Realists with Pessimists.
Difficult projects are the most enjoyable to tackle
6 年Great comments, Cliff.? If SW were treated as an engineering discipline, then rigorous systems engineering would be applied from the beginning.??