Did Data Die on Election Day?
After a night of little sleep from jet lag and disturbing election results, I read the day after the election that a Republican strategist and Jeb Bush supporter argued that data was among the casualties on November 8th. "I've believed in data for 30 years in politics and data died tonight," Mike Murphy tweeted. It is certainly true that virtually no pollsters of any persuasion predicted the Trump victory. But does this mean that data and analytics have no value in understanding political phenomena? And does the failure of data-based predictions in this election reflect on analytics-based decisions in business?
There are many different ways to predict elections. Polling data—asking citizens for whom they will vote in the future—is the most common approach, but hardly the only one. One can also create models of who will win based on the economy (which wasn’t really growing fast enough to drive another Democratic win), demographic changes (which are influential, but ultimately rely on citizens with certain demographic attributes actually voting, and voting for the predicted candidates), and the party of the incumbent. It’s quite rare for a party to hold onto the presidency for three consecutive terms—it happened under Franklin Roosevelt and Ronald Reagan/George H.W. Bush—and so that simple model alone would have predicted a Republican victory in 2016.
But for both good and questionable reasons, most of the predictions for the 2016 election were based on polling data. The primary good reason is that polling-based models were relatively effective in predicting outcomes in the last two presidential elections; Fivethirtyeight.com’s Nate Silver, for example, rode those predictions to analytics stardom.
The questionable reasons for polling’s extensive use are simply because so much polling data is available, and since many of them are commissioned by media outlets, they get a lot of visibility. There are typically numerous polls per week produced as the election approaches—some national, some state, some right-leaning, some left-leaning, and so forth. Sites like 538.com have gotten much better at aggregating polls, adjusting for various sampling biases, reconciling state and national polls, and so forth.
But if they’ve gotten so good, why were they almost completely wrong? Regardless of the polling approach, there are several key assumptions that were often violated in this election cycle. One is that the people being polled will actually vote—plenty of research suggests that self-reported voting intentions are often inflated. Another assumption is that the respondents are telling you the truth about their voting intentions, which may have been questionable for some Trump supporters.
In short, it’s a risky business to base predictions on peoples’ reports of what they will do in the future. Most companies wouldn’t dream of making that the primary focus of their predictive analytics. Focus groups, which like polls are based on asking people about their future intentions (“Would you buy this innovative new product?) have been widely discredited as an accurate basis for marketing research.
Instead, companies rely on actual customer behavior. If a lot of customers have bought a product at a particular price, the next customer will probably buy it at that price too. This is the basis of price optimization algorithms. If many of your customers who bought one product also bought another, the next customer to come along is likely to be interested in the second product as well—the basis of “collaborative filtering” models. And since we can often gather a variety of attributes of customers who exhibited a particular behavior, we can predict fairly accurately what women, Hispanics, or over 60 customers will buy in the future.
It would be difficult if not impossible to gather actual behavior information about voting. It is an essentially private and anonymous activity, and many states even prohibit taking a selfie while in a voting booth. The rise of early voting might be a good source of data about which candidates or issues citizens actually voted for, but we still face the issue of ensuring that they are telling the truth.
While future poll-based political predictions should certainly be carefully examined after this election’s results, to say that data and analytics have been generally discredited is to over-generalize. Companies that do a good job of continually examining the assumptions behind their data and analytical models will continue to succeed in making accurate predictions. Those who ignore evidence that the world has changed, however, will suffer the same fate as this election’s pollsters and poll-based prognosticators.
Written by: Tom Davenport, president’s distinguished professor in management and information technology at Babson College, and cofounder of the International Institute for Analytics. He also contributes to the MIT Initiative on the Digital Economy as a fellow, and as a senior advisor to Deloitte Analytics. Author of over a dozen management books, his latest is Only Humans Need Apply: Winners and Losers in the Age of Smart Machines.
Passionate servant leader in Analytics, proud father and grandpa, musician, driven to help others thrive.
8 年Yup, me thinks we missed that "change is constant" variable this time around. For example, when have we ever seen Presidential candidates execute their campaign "platform" and tee-off on each other via Twitter? Would be interested in the percentage of Facebook posts that contained political debate content/sentiment. Time for more context to wrap around poll-based data?
Project Manager with 44+ years of ERP, IT, business process, systems integration & architecture experience with a track record of delivering global solutions that make a material difference to the bottom line.
8 年The predictions were wrong because those doing the predicting were deliberately skewing the results because they were in Hilary's back pocket not because of bad analytics. I think it should be illegal to project winners for elections during voting periods. It influences the voter turnout and can be used to throw the elections. thank god that people did not believe them and voted anyway. This question is one of ethics in using big data. Big data is fine when you own the data. Gather big data is an invasion of peoples privacy for the sake of profit or to push an agenda. We need laws in this new area to protect people from those who think they have a god given right to tinker with peoples lives and interests. You should not have to sign away your rights to companies for using your information just because you buy something from them which seems to be a favorite trick of big business these days.If you want my information then you should be paying us all big bucks for it!
A graduate of Diploma in Shipping and Port management & Certificate in East Africa Customs Clearing and Forwarding Agent
8 年nooo
Investigative Data Scientist / Operations Researcher / Mindset Challenger - PhD / MBA / CAP
8 年Pollsters have to efficiently and quickly collect enough data to produce a modest statistical margin of error. In order to do so, they make the implicit assumption of representativeness (flawed) and do not probe deeply using differential survey methods in order to avoid survey fatigue. Attribution of sinister motives is dubious. Nor is it a failure of data per se. All it means is that more sophisticated, less naive methods are needed when contests are very close.