How Humans are Fooled By Numbers
I shut myself off from reading all the stats around covid-19 many weeks ago. No more articles, Facebook posts, news reports, or anything citing numbers because both camps (those who believe this is no more than the flu and those who believe covid-19 will kill off humanity if not handled correctly) were bolstering their claims with numbers that appeared legitimate on the surface, yet were really artfully crafted in order to prove a particular point, one which had been determined by the reporting spin artist long before the numbers had been created. The more I read into the statistics, graphs, charts, and data presented by a particular piece of media, the more I was feeling bamboozled by irrelevant and inaccurate data that was being manipulated to prove a point that it didn't actually prove at all.
Beyond covid-19, the art of predicting the future by using past statistical data is pretty common. When people ask me what I think is going to happen to the New England real estate market post covid-19 reopening, I tell them what my gut says. That gut feeling prediction is based on 20 years of experience in dealing with market ups and downs. Some of my competitors predict the future, but then substantiate those predictions with pretty charts, graphs, and data. When I dig into that data, it makes me realize that their predictions are exactly like mine, a hunch. Economic, real estate, and unemployment data can be moved and manipulated to prove whatever you want to prove. Business leaders, politicians, and the media manipulate data to prove a point because they get something out of it. They get listeners, crowds, and followers by predicting the future. Their predictions ring of accuracy when they are backed by numbers because humans since grade school have been taught that numbers represent calculated accuracy and truth.
If you are seeking a book that helps you muddle through the confusion of how data is used to predict the future, pick up a copy of Proofiness - How You are Being Fooled By the Numbers by Charles Seife. It's enlightening to say the least.
Proofiness: The art of using bogus mathematical arguments to prove something that you know in your heart is true - even when it's not.
In this book, Seife offers countless real-world strategies that the media, businesses, and politicians use to dupe us. Here are some:
Potemkin Numbers: These are numerical facades that look like real data. Potemkin numbers aren't meaningful because either they are born out of a nonsensical measurement or they're not tied to a genuine measurement at all. For example, Loreal's Extra Volume Collagen Mascara gives lashes "twelve times more impact." Vaseline's new moisturizer "delivers 70% more moisture in every drop." Putting claims into numerical form makes it sound real, but any such measurements are absurd.
Disestimation: This is the act of taking a number too literally, underestimating or ignoring the uncertainties that surround it. Round numbers send a subliminal signal that their associated measurements have large errors - that you can't trust them because they are crude approximations. Long, ultra-specific numbers send the opposite message - that they come from measurements that are more trustworthy and closer to the absolute truth. If someone wants to make their message more believable, they simply won't round numbers and might even add a decimal, because then you are more likely to believe it.
Cherry-picking: This is the careful selection of data, choosing only the information that supports a particular argument while underplaying or ignoring data that doesn't, even if that additional data brings one closer to the truth. Cherry-picking is lying by exclusion to make an argument more compelling. Al Gore was accused of cherry-picking data for his 2006 movie An Inconvenient Truth which created visualizations of what the earth will look like when the ocean rises by 20 feet. While global warming is a real thing, most of the data available does not predict a 20-foot rise in sea level.
Comparing Apples to Oranges: This happens when data seems to be linked, but it really is not linked at all. The example used in the book was how, in 2008, the New York mayor cited the uptick in math and reading scores for New York City students. That mayor was "proving" that the investments they had made into schools were paying off. What the mayor forgot to add was that they had also made the tests easier than preceding years. Therefore, the mayor was comparing apples to oranges as far as the test scores went, but was trying to prove his point by making constituents believe that he was comparing apples to apples.
Apple-polishing: Some grocers have been known to make their produce look fresher by waxing and policing their fruit, gassing tomatoes, and turning individual pieces to hide their bruises and blemishes. The same is often done when it comes to using numbers to prove a point. The book uses the example of a Quaker Oats graph that the company used in marketing to "prove" how oatmeal consumption lowers cholesterol. Here's the graph they used in advertising:
The problem with this graph is that normally, the line at the bottom of the chart would represent zero cholesterol. However, the vertical axis at the bottom of this chart, isn't zero but rather 195. The way they have organized the chart makes the numbers look much more dramatic. If the vertical axes were actually zero, it would look more like this graph:
Causuistry: Casuistry (without the extra "u") is the art of making a misleading argument through sound principles. Seife coins causuistry as a specialized form of casuistry where the fault in the argument comes from implying that there is a causal relationship between two things when there is no such linkage. An example shared in the book was a 2004 study that showed that Olympic athletes who wear red were more likely to win medals over those who wear blue. As humans, we are programmed to look for patterns. If you draw a line through plotted data on a graph, you will find some order in the random chaos of the dots, but that's only because the line tells us what we should be seeing, even if there's nothing there.
Regression to the Moon: Regression analysis is a tool that is used to take certain data and then plot that data out into the future based on the current pattern. If a tree has grown by 6 inches per year for the past three years, it's easy to estimate that the tree will grow by 18 inches over the next three years. This would be a very basic form of regression analysis. The problem is when that regression goes on forever, without taking into account important data that may affect that regression model over time. In a 2004 whitepaper, a group of scientists analyzed athletes' Olympic performance on the 100-meter dash over the years and found some patterns. Male sprinters were getting faster over the years, so steadily that you could draw a straight line through the data. They used that data to predict how fast men would run the 100-meter dash far into the future. The problem is that if you regress that data too far out, you have runners who are breaking the sound barrier with their speed, which is simply not possible, even though regression analysis tells a different story.
Risk Mismanagement: Minor changes in wording can easily make a huge risk seem worth taking or an insignificant risk seem dangerous. In the 1980s two economists proved how much humans can be manipulated by risk-related data. The economists presented test subjects with a scenario where they had to make a difficult choice: Imagine that the US is preparing for the outbreak of an unusual disease which is expected to kill 600 people. Two alternative programs to combat the disease were proposed. The economists presented the exact same choice to two separate groups of subjects, but for the first group of subjects, the wording emphasized saving people from the disease, and for the second, the phrasing dwelled on the victims of the disease rather than the survivors. The two scenarios were identical, yet the results were massively different. When the phrasing emphasized survivors over victims, 72% of the subjects voted for the conservative course of action which would save some patients with some certainty but would let others die. But when the wording spoke of victims rather than survivors, 78% chose the riskier course of action to save a few lives. The test subjects made their decisions based not on logic but upon how an authority presented risk to them. By the way, marketers do this all the time, when it comes to presenting risk to buyers. If they can understate the risk (or hide or shuffle the risk around) there's money to be made.
Margin of Error: Any time there's data presented, that data comes with a margin of error. This margin reflects the imprecision in the data or poll caused by a statistical error. By the way, the bigger the sample size of the data, the smaller the margin of error. The margin of error ONLY represents statistical error, which is the inaccuracy that is inherent to using a sample to represent the whole. In other words, you poll people to find out how they are voting, that poll will be much more accurate, the more people you poll. If you poll 100 people to determine how 1 million people would vote, there's much greater room for error than polling 500,000 people to see how 1 million people would vote.
Systemic Error: Systemic errors are never included in a margin of error. Systemic errors do not diminish in size as the sample size grows. For example, in the 1930s, amidst the Great Depression, Literary Digest polled people about who would win the upcoming presidential election. They collected more than 2 million ballots which predicted that Republican Alf Landon would win the presidency over President Roosevelt. Since the sample size of over 2 million ballot responses was enormous, there was a very small margin of error and so Literary Digest predicted that Alf Landon would win the presidency. In fact, Roosevelt won, because of the systemic error embedded within the polling. First, it was during the depression and the magazine pulled their mailing list from phone records. In general, only wealthier households had telephones and most of the wealthy were republicans, therefore there was a systemic error in that the list was biased toward a Republican candidate. The second bias was that typically people are less likely to even respond to a voluntary poll if they are currently happy with a candidate, just like people are less likely to fill out a Yelp review if they are content with their level of service.
If you are anything like me, your head hurts with the amount of data being spewed around during the pandemic. For every data set that "proves" it's perfectly safe to open the country, there's a separate data set that "proves" it's not. Rather than focus on figuring out what is believable and what's pure trash, I recommend checking out Charle's Seife's book, Proofiness - How You're Being Fooled by The Numbers, so that at least your BS antenna will be on the lookout anytime you hear a marketer, politician, or the media taut data as proof of their argument.
Strategy | Sales | Marketing | Operations | Leadership
4 年Great piece, Stacey Alcorn. It's amazing how many of these techniques are used and how often.