Data-led engineering & Why bullet holes make poor datasets
Stéphane (Stef) Malhomme
Agile Prince2 - Senior BA, Project/Product Manager - AI, Data, Cyber, SDLC, IoT, Cloud & SaaS
Data the new oil, data the new illusion
The use of the word “data” has grown exponentially in software. IT professionals use it continuously across organisations, without always driving the change they aim for. This disconnect (particularly relevant to engineering) is not new. It can be useful to explore history to illustrate what sound data-led engineering looks like and what pitfalls to avoid.
The data scientist who saved thousand of air crews
In 1943, the tide of WW2 was turning but the USAF and Navy still had to deal with a gruesome reality: They were losing too many aircraft to German AA fire, particularly bombers. Deadly accurate German flak was driving heroic crews insane with terror. Kill rates were as high as 1 in 10 sorties, 1 in 5, 1 in 3 at the worst of times. Bombers shot in formation would often spin out and collide. Crews horrifically maimed often bled to death without help available. So much so, they were often sent out by parachute in the hope that German army would save and send them to a POW camp. The situation was dire. Things had to change. Redesigning armour seemed a key idea gathering consensus.
Placing armour, how much and where is what you would call a complex problem in Design thinking: There are no apparent solution and changing just one parameter can have lasting consequence on the whole. Adding too much armour, or in poor places, will slow the plane down and result in heavier casualties.
Hence the central question: “How much armour, and where, should we reinforce bombers with?” A tough and real decision to make. Armour enhances survivability of aircraft only if put sparingly as it also slows them.
The SRG, the power of statistics and collective intelligence
Exactly the kind of question that the SRG (Statistical Research Group) had been set up for. Operating 1942-1945 the SRG gathered the most brilliant US statisticians and mathematicians, tasked with guiding war-related decisions and strategy. The SRG was somewhat comparable to the Manhattan project. They worked not towards a physical bomb, rather equations guiding engineering, strategic and tactical decisions.
In an atmosphere of free-thinking, where dissenting views were encouraged, teams worked around the clock, using early Merchant calculators to model and optimise anything from the best bombing protocols, flight patterns, aerodynamics designs, angles of evasive maneuvers. Many would go on to lead faculties at MIT, large industrial groups, found new disciplines (Wiener - Cybernetics, Savage - Decision theory) or receive Nobel prizes such as Milton Friedman. Friedman it was often quipped without irony, was usually the 4th smartest guy in a room of 4. Much of their research would also inform engineering excellence of the 60s and 70s (sequential analysis, Six Sigma).
The gruesome problem of amouring bombers is given to Abraham Wald, a Jewish refugee born in Austro-Hungary (later Romania) of prodigious skill in mathematics. He starts acquiring the data, going through the pictures of hundreds of B-17’s, B-29’s, Dauntless, Marauders, Liberators. Some so shot up it seemed to defy logic they survived. Wald’s beautiful mind, guided by advanced mathematics, starts forming an intuition. He says nothing now. Painstakingly he plots all the planes’ bullet holes on an aircraft representation. Finally done, he finds this. Most holes are on wing tips, central wings, and tail wings. Virtually none on engines, cockpit, and the transmission area 2/3rd down the plane body.
“Fantastic!” said the Navy, who in truth had also noticed that pattern. Seeing an opportunity for armour efficiency, they conclude that it is settled, we must put the armour where the bullet holes are.
“Absolutely not, you should put the armour exactly where you see no bullet holes” replies Wald cooly, with the aplomb of a ball bearing landing on a table.
Design Thinking & The power of personal experience to see the invisible
“Where are the missing bullet holes?” asked Wald pointedly?
The room goes silent. A dawning realisation muffles everyone.
Wald went to explain in even less uncertain terms: The reason you see so few bullet holes in these areas is because these impact locations are so lethal for a plane that most which got hit there never made it back at all. What you are looking at is not the reality of bombers getting hit by AA and fighter fire. It is actually a discrete subset, you are looking at only those that made it back to base. This data set is profoundly skewed. This would later be coined “survivor bias”.
The Navy agreed to optimise armour where there were no bullet holes and the survival rate of the bomber crews picked up immediately. Wald’s findings were still guiding aircraft engineering in at least two later wars (Korea, Vietnam).
There is something philosophically inspiring about Wald, Wolfowitz and Friendman, all Jews, all ridiculously intelligent, all who had fled Europe, all seriously motivated to fight the Axis. Maybe there is something to be said of the Jewish diaspora, persecuted for millenia, that they could see things were others could not. That, habituated to reading between the lines of official propaganda they existed in a different time, saw things on a different level. That culturally they could see patterns where others could not.
4 data analysis lessons we can take away from this anecdote
Data is often the beginning, not the end
Data can be a dangerous thing in the sense that it may lull decision makers into a false sense of security. We all make assumptions and by their nature, many of the assumptions we make we do not see. And had the Navy put the armour where they saw the bulletholes would have resulted in catastrophic real life outcomes.
What data set are we looking at? Is that the whole story? Are we looking at the data representing only a successful outcome? Or only that which led to an unsuccessful outcome? How can we ensure we look at a holistic set? Is the data cleaned and relevant? Are we certain that some appearance or intuition is born out by fact, are we not mistaking correlation with causality? Etc.
Collective intelligence always beats the HiPPO (Highest Paid Person’s Opinion)
This is something I have previously alluded to here if you are curious to read more. The HiPPO concept is to me, one of the most important breakthroughs of the last 20 years when it comes to best-in-class business analysis and engineering. The concept was researched and the word coined by Avinash Kaushik. In short, a team’s collective intelligence is usually higher, captures more value, more ideas, fleshes them out better than any odd single individual could, however brilliant he or she may be.
My favorite tool in terms of design and business analysis is running a User Story Mapping exercise. Grouping all relevant stakeholders (a software techie, an architecture techie, a business rep, a supplier, the PM, the project owner, a subject matter expert) looking at the same source of truth AT THE SAME TIME. This is a very simple tool/workshop, intuitive, fun often, that allows everyone to discuss in live, the value we want the capture, what it entails, why, how we plan on doing it. It surfaces very quickly issues, contingencies internal to the project, and … assumptions. Two hours, once a month, I estimate, save on average for a mid-sized project, weeks, and weeks of work, friction, and/or running into sharp objects.
It is no accident that the SRG allowed researchers to work without much control and allowed them to explore surprising ideas, invite opposing views. Culture matters to data too.
You can’t cut corners with data capture and research
Wald’s work on the bomber armour location was not his first forays at the SRG. On a previous assignment he had worked with Jack Wolfowitz on the ideal make up of fighter planes ammo. Fighter planes could carry up to 5 different types of ammo (“dumb” metal, incendiary, tracers, armour piercing, high-explosives) and the SRG was tasked there with determining the best make up in the ammo belts.
As it turned out the ideal make-up for fighter planes ammo belts was often to mix all 5. Their differential cumulative effect outweighed the single merit of any odd one.
This conclusion however they could only come to by testing profusely and spending enormous amount of time and energy working on their sampling method itself. Where many live fire test sargeants would have stopped the experiment way, way before they had gathered the variety of depth of data required.
Data-led decision making in that context was hugely important, and well worth the large amounts of time, and energy spent, even if it seemed it was a foregone conclusion that it was this ammo type or that. The truth is often less apparent.
Senior Consultant | Wi-Fi and Smart City Pioneer | Digital Transformation | Predictive Services & Machine Learning | Published Author
3 年Maybe the Germans did ... But I reckon top of the heap for war data getting it wrong is gonna be McNamara in Vietnam and his beloved replacement rate ... and not knowing/understanding the enemy birth rate of 200,000 teenagers turning 18 pa in the north getting drafted into the NVA replacing the 200k the Americans were killing per year. Then there is the Rand corporation Who were /are a bunch of unrepentant rationalists...
Senior Consultant | Wi-Fi and Smart City Pioneer | Digital Transformation | Predictive Services & Machine Learning | Published Author
3 年Aircellent read Stephane! Always like a WW2 reference . Data is there to better understand why somethings happening as you say - not the full story but adds to it My favourite from World War II is how the U.S. Air Force claimed they shot down the luftwaffe twice over due to multiple door gunners claiming the same kill - Took the RAF to sit down with the Americans and say okay let’s sanity check that before the Normandy landings Re protecting wings and the fuel tanks - from memory Eisenhower ended up approving self sealing fuel tanks ... Simon rooney
Agile Prince2 - Senior BA, Project/Product Manager - AI, Data, Cyber, SDLC, IoT, Cloud & SaaS
3 年Matthew Toon Steve Milburn James Peet