Software Failure on the Flight Deck
It's late at night, well past quitting time for the day. Maybe you're an early-to-rise, early-to-bed person, and you're thinking about wrapping up your day and hitting the hay. The phone rings. It's that special ring tone that lets you know it's your team leader. This can't be good. Your team leader explains the airline you work for has been grounded. Dozens of airlines around the country are not able to fly because something has gone wrong with the mobile app the pilots use to plan and execute their flights. Thousands of passengers are boarded, but now stranded at the gate! You have to get this fixed! Right Now!
As you probably already heard, this actually happened to American Airlines (AAL) Tuesday, April 28, 2015 at 10:47 pm. Bill Jacaruso, 54, was traveling home to Austin from Dallas/Fort Worth airport. The plane boarded at 8:00 for an 8:20 flight, but, just sat at the gate. The pilot eventually got on the intercom and explained the delay. He said that his copilot's iPad had gone blank, and 24 minutes later the pilot's iPad went blank. The plane couldn't go anywhere because a mission critical software application had failed. An entire airline stopped in its tracks because of a computer glitch!
American Airlines said the problem was caused by a faulty third-party app. In 2013, American became the first airline to have its pilots rely entirely on iPads for flight plans and navigation. Plans get updated all the time, so the company was understandably excited at the prospect of eliminating lots of excess paper. (It estimates the paperless program saves the company at least 400,000 gallons of fuel every year. To put it into perspective, 8,000 iPads replaced 24 million pages of documents.) It is for this reason that digital products must be exceptionally "rugged".
Apps should embody the rugged attributes as we have outlined in the graphic above. When a situation like the one at American Airlines occurs, the software must be able to identify the failure and recover quickly. If, in fact, the iPad app in question is 100% dependent on functioning external software, there's not much the app can do beyond gracefully reporting the failure to the user. Ideally, this would be something more user-friendly and informative than just going blank.
We'll learn more about this glitch that grounded an airline in the future. Right now, focus on the fact that it happened, and how dramatically software failures like this can impact our daily lives. This story indicates how important it is for digital products to have rugged attributes woven right into their fabric from the very inception of development.
CabForward? is deeply committed to building rugged software. We are always examining how ruggedness can be achieved and how to define exactly what the attributes of ruggedness are. Computer software must be Secure, Reliable, and Maintainable. Rugged software is designed to survive predictable and unpredictable real-world conditions. Truly rugged software is self-aware and self-healing. Sounds unbelievable? Maybe. But it IS possible. As consumers, though, we're used to software that just barely works, if it works at all. The bar is very low.
Here are some working definitions for the Rugged Attributes:
The “Security” Family of Rugged Attributes:
Secure: Able to secure data and transactions from unauthorized access.
Defensible: Able to defend oneself; aware of and able to mitigate risks and vulnerabilities.
Sensible: Able to detect and report malicious activity; responds with prudence and measured force.
The “Reliability” Family of Rugged Attributes:
Reliable: Able to perform in all circumstances.
Agile: Able to adjust quickly and easily.
Durable: Able to withstand the test of time.
Available: Able to respond to all requests without interruption or delay; scalable.
Recoverable: Able to recover from internal failure or attack; self-healing.
Survivable: Able to survive and perform critical operations when external dependencies fail.
The “Maintainability” Family of Rugged Attributes:
Maintainable: Able to be maintained easily and at a low cost.
Predictable: Able to be easily understood; exudes predictable behavior; follows convention.
Portable: Able to be migrated between environments with little effort and configuration.
Observable: Able to be inspected and evaluated; has a dashboard.
Controllable: Able to be managed and configured without requiring an update; has settings.
The jury is still out on the completeness of our definitions. What do you think? Is there something we're missing? Doesn't this American Airlines incident point out how susceptible we are to software vulnerabilities? Also, consider how a bit of software can so dramatically impact our daily lives. Are there other attributes of rugged software that we haven't yet identified? Please feel free to share your thoughts in the Comment section below.