Surviving the Survivor Bias!
Sandeep Singh Banga
Senior Human Resources Leader with track record of handling Global Stakeholders and creating engaged workforce culture!
A group of statisticians turned the tide of World War I. They taught an important lesson about Data and Survivor Bias. Here's their story.
A Top secret Statistical Research Group (SRG) created data models to decide troop deployment, trajectory of planes, bombs carried for each mission etc.
The US Military reached out to them with data on which areas of the plane were being hit by enemy bullets the most. They wanted to reallocate the plane’s armour to give additional protection to vulnerable areas and reduce fatalities. ?
When statisticians looked at this data they realized that Engine had the least holes and the Fuel Tank and fuselage (main body) had the most. They logically recommended that more armour should go to the Fuel tank and fuselage.
Legendary Mathematician Abraham Wald reviewed their report and to their surprise he completely disagreed with the recommendations. He said that the data has 'survivor bias'. It's not that enemy shoots the engine less but the fact that the planes whose engines got shot never return.
领英推荐
The analysts had a flawed assumption that the planes who returned safely are a representative data set of all planes. In reality the missing data was the most critical - the bullet injuries that led to planes falling down.
They went on to reinforce armour on the engine. A move that gave them a 5% advantage on the enemy and changed the course of the war.
Another example is that of Google. After years of tricky hiring practices and amazing interview questions, the human resources teams at Google decided to use some statistical analysis to see just what a great job they’d done. The problem was they never hired a candidate who failed the interview. All measures of success were based on the performance of employees who made it through the interview process successfully. Google went back and hired some of the failed candidates. Turns out these performed as well as the successful interviewees. Google discovered that while its hiring process was newsworthy, it did not actually predict success and that by eliminating failure data from its sample set, it made the difference between failure and success harder to see.
What employees want is calculated from the current set of employees. Maybe those that wanted different things have already left? Candidate experience survey is calculated from the small sample size (<10%) that chooses to revert to the survey. Mostly those that get selected.As Ronald Coase says “If you?torture?the?data?long?enough, it?will confess." However check if the data itself has a sample selection bias that occurs when a data set only considers “surviving” or existing observations and fails to consider observations that already ceased to exist.
So next time, dont study just the projects that won, study those which failed also. You will see something amazing!
ERP Functional Consultant
2 年Amazing ??????
???????? ?????????????? @ ?????????? ???????????????????????? ?????????? | ?????????????? | ?????? | ?????????????????? ????????????
2 年CFBR+++
Design | Develop | Deploy - Repeat !
2 年So true !