Data and analytics in the 2024 U.S. Presidential election: how predictive are the polls and models? (Part 2)

Data and analytics in the 2024 U.S. Presidential election: how predictive are the polls and models? (Part 2)

Hopefully you had a chance to read part 1 of my every-four-years analysis of U.S. Presidential polling accuracy. If not, you can get caught up here, to see where polling and predictive models stood the night before the 2024 election, as well as a look at how bad the polling accuracy was back in 2020.

Well, let's get right to it.

Final polls before election day

Figure 1 is where things stood in the polls and prediction markets the night before the election:

The data at left in Figure 1 shows current polling differences between Donald Trump and Kamala Harris. Each number in the table is the support for Donald Trump in that particular poll minus the support for Kamala Harris. Negative numbers (shaded blue) indicate the polling support for Harris; positive numbers (shaded red) indicate the polling support for Trump. I've included two polling sources - RealClearPolitics' (RCP) average of polls, and Trafalgar; and two simulation-based methods - Silver Bulletin, and FiveThirtyEight.

Figure 1: At left, current status of polling and forecast data. At right, a selection of prediction market win probabilities for Donald Trump. Data collected Nov. 4, 2024.

At a glance, the data at left mostly shows a prediction of President-elect Trump carrying five of the seven key battleground states, while Vice President Harris is slightly favored in the national popular vote.

At right are win probabilities from a few selected prediction markets. I expressed these numbers as the probability that Donald Trump prevails, so greater than 50% is shaded red, indicating higher probability for Trump; less than 50% is shaded blue, indicating higher probability for Harris. We see the same basic trend amongst the battleground states - Trump is predicted to win five of the seven as well as the national popular vote, however most of the probabilities are somewhat close to 50%.

Okay, so what actually happened?

2024 results as compared to the polls

Figure 2 below compares the pre-election polling from Figure 1 above, with the actual 2024 election results. At left in Figure 2 is the same pre-election polling data from Figure 1. At right are the 2024 election results, expressed the same way. Each number in the "results" column is calculated by subtracting Vice President Harris' vote share in percent, from President-elect Trump's vote share in percent. So, just like the polling data, positive numbers (red) indicate a Trump win; negative numbers (blue) indicate a Harris win. As we know, President-elect Trump prevailed in the popular vote by 3.5% as of this writing, and in all seven battleground states.

The average polling error for the popular vote was greater than 4 percentage points. All of the pre-election polling error was skewed in the direction of Vice President Harris.

But look at the error. The average polling error for the popular vote (excluding Trafalgar) was greater than 4 percentage points. Some of the state polling was closer, but the thing to note is that just like in 2016 and 2020, all of the error is correlated and in the same direction. As you can see by the blue in the right-most column, all of the pre-election polling error was skewed in the direction of Vice President Harris.

Figure 2: At left, final pre-election polling as of Nov 4/5. At right, 2024 election results (as of Nov 8), and the average error between the actual results and pre-election polls (excluding Trafalgar).

All together, the average polling error across states is as high as 3%, with the national polling error at 4% or higher. We're still waiting for final vote tallies to settle in some states, but this ranks alongside 2020 in terms of some of the worst performance of pre-election polling in modern American history.

2024 ranks alongside 2020 in terms of some of the worst performance of pre-election polling in modern American history.

Key Takeaways

After three election cycles of taking a look at how well (or poorly) we do at effectively measuring voter behavior in advance, I've got several key takeaways for you ... some of them admittedly controversial!

1. Polling continues to get it wrong

As clearly seen in the analysis above, the polls contained a significant amount of correlated non-stochastic bias, leaning all in the same direction - the direction of the Democratic candidate. After a third consecutive election with President-elect Trump as a candidate, polling firms (with the exception of small firms like Trafalgar which was much closer) have not gotten appreciably better at improving voter models, sampling strategies, or correcting for social desirability bias. Polling inaccuracy is for the most part just as poor as it was in 2016.

2. Prediction markets continue to (mostly) get it right

They don't have a perfect track record, but market-based prediction methods continue to perform well, at least at predicting the overall winner. They're not good at predicting vote share, which is what polls are trying to do, but they are good at giving a solid probabilistic estimate of which candidate is likely to win and how certain we should be about that conclusion. Figure 3 below shows the 2024 RCP trend of prediction market probabilities including where they stood on election day (the red probability trend line represents President-elect Trump; the lavender trend line represents Vice President Harris).


Figure 3: RCP prediction market averages, March 2024 through election day (red: Trump; lavender: Harris)


3. Advanced simulations and methods are (at the moment) a waste of time and energy

This is very hard for me to write. I absolutely love Nate Silver's and FiveThirtyEight's Monte Carlo stochastic simulation methods. I think there are no finer approaches anywhere around for aggregating uncertain data like this into overall probability estimates. Their stuff is flat-out awesome. But as I wrote for the 2016 and in 2020 elections, the input data (polling) has non-stochastic (correlated) biases similar to or larger then the margins of victory in the elections, and we saw it again this week in the 2024 election. There's just no way an outstanding modeling or simulation method can overcome such poor input data. I have worked in data, analytics and AI for more than 20 years, and the first thing you learn is that the analytic method you choose when analyzing your data must be the right fit for the data. You might have a method that's great, but it's a terrible fit for the data you have. Much as I love these advanced methods, I no longer believe the data is high enough quality to justify them. It's curious that the headline of Nate Silver's post-mortem tries to claim that their model predicted the final map... but if you read the article, the model missed two of the seven swing states, and had an overall prediction of a Harris win. Anyway, as I wrote in part 1, it could be that these polling biases are a temporary artifact of President-elect Trump, and future elections might have more accurate polling. But right now, these advanced methods - which I love - are a waste of time at best, and create unreasonable expectations in the electorate at worst.

4. We should have a conversation about whether public pre-election Presidential polling is helpful or harmful to the country.

If you think about it, the concept of "who's winning" or "who's losing" prior the election doesn't make a whole lot of sense. But we've gotten the point where we treat these elections like sporting events that are an ongoing competition. The campaigns and other groups will always need internal polling to help them make decisions. Where should we have our rallies? Where should we invest in advertising? Where should we hire staff to knock on doors and get out the vote? But those activities are discrete decisions, for which internal polling serves as decision support. What decision support or benefits do we derive from pre-election public polling? As we've shown, polling is less accurate than it has been in more than 40 years, and these inaccuracies can create at best a sense of unmet expectations in the electorate, or at worst an erosion of trust in the American election system when outcomes don't align with the polls. Lots of potential downside, with little upside aside from entertainment value and giving the 24-hour news cycle something to talk about.

Well, that's a wrap for my 2024 election analysis. It's interesting to see that we have the same enduring correlated bias in the polling data, favoring the Democratic nominee, in three elections in a row. What will 2028 bring?

要查看或添加评论,请登录

Steve Bennett, Ph.D.的更多文章

社区洞察

其他会员也浏览了