The New Black Box: Using Small Failures to Prevent Major Catastrophe
Brian Gilmore
Community & Advocacy @ HiveMQ | Former IoT @ InfluxData | Former IoT @ Splunk
The following is an adaptation of a presentation I first gave in 2012. At the time, I was inspired by the impact that big data and investigative platforms could have on improving the efficiency of businesses and the safety of workers. It was this passion that led me to Splunk, and I've had the fortunate opportunity to work for Splunk over the last 6+ years to help our customers and partners deliver on this opportunity.
On the evening of July 17, 1996, Trans World Airlines Flight 800 took off from JFK International Airport in NY. Just 12 minutes later, 230 lives tragically ended in one of the most deadly aviation accidents in US history.
The National Transportation Safety Board was notified within the hour and dispatched a go-team who arrived in New York the next morning. The FBI was called in due to early eyewitness reports that suggested a possible link to terrorism.
Search and recovery efforts started immediately. Using remotely operated underwater vehicles, sonar, divers and even commercial fishing vessels, investigators meticulously pieced together the evidence. At the same time, they began to piece together the plane in a nearby warehouse.
Less than a week into the investigation, US Navy Divers recovered the flight data recorder - or the “black box” - over 100 feet below the surface of the Atlantic.
Four years later, we had our explanation. It wasn’t terrorism, but mechanical failure - caused by a likely short in a wire of a fuel monitoring system which penetrated the center wing fuel tank. A combination of air, spark, and 50 gallons of fuel was at fault.
Twelve years after the tragedy, federal officials put into place new regulations requiring the addition of crucial safety devices to airplanes with center wing fuel tanks. Twelve years.
If you think about it, the black box is a fascinating technology. Equally high and low-tech, it was invented in the early 1950s by David Warren, an Australian academic and researcher.
It’s no coincidence that Warren dedicated much of his career to investigating aircraft accidents. He had lost his father in an airplane crash when he was 9.
Warren came up with the idea for the black box after seeing a miniature tape machine at a trade show. He also happened to be investigating crashes of the Comet, the world’s first commercial airliner. A spark of synthesis occurred.
Warren pondered how valuable it would be to his investigation if someone on the Comet had been using the machine at the time of the accident. Warren remarked in a 2003 interview that if they could find the tape in the wreckage and play it back, “We’d know what caused this.” He knew that data was the key.
His prototype was built within three years and could store four hours of voice recordings and instrument data. It was named the “black box” - not for its color, but its technical wizardry. You didn’t need to understand how it worked, but it would do “wonderful things.”
According to Warren’s 2010 obituary in the Independent, the reaction was just as you would expect. The additional complexity was rejected by the airforce - who commented: “the device would yield more expletives than explanations.”
The Australian pilots union rejected the equipment outright - worried it would be used to spy on the crew. “There will be no takeoffs with big brother listening,” they said.
Warren persisted. He approached aviation experts and manufacturers in the US and Great Britain, and within 15 years every civilian passenger plane in the world was required to carry the black box.
It’s impossible to estimate how many lives Warren’s big idea has saved by enabling active investigation and action — putting the right information, in the right hands, at the right time.
Now Warren’s vision was innovative and disruptive. It made a significant impact and transcended industries. It was also evolutionary - not only built upon the innovation of the portable dictation machine but also upon the near century-old event recorders of trains, and centuries of old handwritten ships logs.
In support of that continued evolution and innovation, I believe there is an opportunity for all of us to build our own, new version of a black box. This effort would enable us to responsibly improve the way we do business - produce goods, serve our customers, and protect our employees.
Whether we are trying to improve quality in manufacturing, human safety in the supply chain, or to create better connections with our customers, the approach is the same. Investigate. Monitor. Analyze. Act.
We can, and should, collect all data. Storage is cheap, and bandwidth is high. Data integrity and security are challenging - but not insurmountable.
We do not yet know the questions we will need to ask in the future. We can’t guess today about contributing factors to tomorrow’s problems, and there are a whole host of unknowns.
Think about every piece of data - every metric or log, every image, every document or message, as a vital pixel in a high definition recording you are building of your operations.
Use this data to detect actual and potential issues, ideally before they become critical. One minute and thirty seconds before the explosion, the captain of Flight 800 remarked that fuel flow readings were “crazy.” Not enough information and not enough time.
Detection can be a manual process - for example, the outcome of root cause analysis done by a subject matter expert. Advances in machine learning, including anomaly detection and behavioral baselining, have enabled more hands-off approaches as well.
By training a model with lots of or continuous examples of healthy behavior with reasonable outcomes, you can then ask that model to let you know when something is different.
These differences - deviations from normal behavior are small failures in their own right - frequently considered “noise.” A crazy reading. Something’s “off.” Together, these small and often ignored failures can lead to catastrophe.
Now if we can get that critical information into the hands of an expert in a timely basis, they can work magic. They can use their experience and intuition to follow the trail and quickly investigate the issue using all of the other information collected and at their fingertips.
How does this relate to that? In what order did these things happen? Has that ever happened before? What happened next? What should we do?
Some investigations take 12 years for good reasons. Some only take minutes. We’re incredibly fortunate that every failure and every investigation isn’t a matter of life or death. I sometimes wonder if we could treat them like they are.
I challenge all of us to think long and hard about how we can improve the quantity and quality of information available to us, our teams, and our customers. Like David Warren, we’ll likely be challenged by skeptics and naysayers and obstructed by regulations.
If we enable ourselves and others to be more data-driven, we’ll see improvements in performance, availability, and security.
If we encourage everyone to be creative, innovative, and disruptive with data, we’ll see evolutionary and giant leaps in product capabilities, service opportunities, and customer service. This effort will anchor our competitive advantage and advance our companies and industries as a whole.
Senior Writer @ Tech.eu | Journalist
5 年look forward to reading it :)?
President at Leonard Lowe, LLC
5 年"We do not yet know the questions we will need to ask in the future. We can’t guess today about contributing factors to tomorrow’s problems, and there are a whole host of unknowns."? ?But if we radically expand our data collection and analysis, there will be less unknowns and more successful solutions to the problems we encounter.?