Polls Apart: The 2019 Polling Failure
Jim Reed looks at the reasons behind the failure of the published polls to accurately predict the 2019 Federal election result. In busting some of the myths about what went wrong, he points to the most likely focal points for the industry to address.
The Result & the Impact
Unless you have been living under a rock you cannot have failed to notice the amount of attention being paid to the poor showing of published polling at the recent Federal election.
Building on a steadfast trend across the term, a Labor victory at around 52% was widely predicted in the final days, but it was the Coalition who eventually reached this winning benchmark and retained government after a swing to them.
But if there is only ‘one poll that counts’, why so much attention on the published polls?
The Surprise
First, in Australia we are blessed with a track record of polling accuracy as long as our economic growth, partly as a result of a stable of reputable polling companies that have historically done the right thing and partly as a product of compulsory voting and preferencing making results easier to measure.
That all major polling companies got this result wrong was a surprise to them, to the media outlets publishing them and their commentators, to the public at large, and even to many of the political parties contesting the election.
But we should have seen this coming.
Inaccurate polling has long been noted in other jurisdictions with more tricky voting systems, such as the UK and US, in Australia most polls got the same-sex marriage result wrong, and predicted results in the latest Victorian and NSW elections closer than they turned out to be. The surprising part is that polling remained accurate for so long.
The Importance
And second, many may not regard polling as important, but the consequences of incorrect polling are significant.
Perhaps the greatest potential impact is on the election result itself, with the tone and focus of media often influenced by the polls, and an expectation of a win for one side providing freedom for swinging voters to vote in other directions to send a message.
But think also of the political and policy decisions based on polling feedback, the business plans for this term and the personal choices that are made on the strength of expected outcomes.
And spare a thought for the research industry itself. Political polling is not a significant part of it in dollar terms, but it is an important one because it is the part that is public-facing and where accuracy is most easily judged against reality. If clients and respondents are to trust the industry’s work and be encouraged to take part in research upholding its reputation is critical.
The Inquiry
The repercussions of the polls getting it wrong are such that one of the research industry’s peak bodies – AMSRO – is currently undertaking a thorough review. It will take a close look at the methods used to administer the polls to determine where the failing occurred, essentially replicating the process of the British Polling Council in the UK.
This should be welcomed by all in the industry as its findings will have ramifications well beyond published political polling.
This article does not seek to pre-empt the findings of the AMSRO review, only to put forward my views on where the systemic inaccuracies are most likely to have occurred.
Why Poll?
One thing that I doubt the inquiry will cover in its terms of reference is why we publish political polling at all, or at least why so many polling companies are needed.
Political parties regularly undertake their own private research to better understand the electorate, their views on politics and policy. This is a very different beast to published polling, where voting intention is but one of a myriad of quantitative and qualitative lines of inquiry allowed by larger budgets. This is necessary and will happen regardless.
Published polling tends to be less frequent, uses different methods and focuses almost exclusively on voting intention and leadership ratings. One might argue that this provides a legitimate measure of public mood, a healthy feedback mechanism to the politicians on the public’s judgment of them, it informs political commentary in the media, and it entertains.
Conversely, some would argue that it is a distraction, that the use of vote is a blunt tool to measure opinion (when many may dislike a leader or policy but still vote for the party habitually) or electoral prospects (where marginal seats matter most), that it could influence outcomes if used during campaigns, and ignores qualitative insights.
I am sympathetic to the former position as I believe more information is generally a good thing – I am a researcher after all – but if we are to produce, publish and receive information it must be of a quality that can be relied upon.
It must also seek to achieve the purposes for which it is designed, which may require rethinking whether we poll during campaigns, whether we use qualitative research techniques, what questions are asked and how results are presented.
Busting Myths
Before looking at what is likely to have gone awry it’s important to correct the record on a number of theories that have been put forward.
The Polls got it Right Really
First, the polls did get it wrong. This may seem an obvious statement, but some have assumed that the polls were only ‘out’ within their stated margin of error. The fact is that most of the final polls used samples of between n=1,500-3,000 and their predictions fell outside the subsequent error margin.
Second, some have queried whether the system used to allocate preferences was at fault. Most polls apply the flow of preferences from the previous election, and this election’s flows to the major parties were quite similar to 2016. Though some also report self-allocated preferences or mix in flows at recent state elections, it was the primary votes that were inaccurate (often by 2-5 points for the major parties) and not the application of flows.
Third, some pollsters have asserted that seat-by-seat polls were accurate where the national polls were not. They may have been closer at the time of early voting results, and certainly they got the local winner right more often because there was more than one result to look at, but an analysis of the ten seat polls conducted by Newspoll, for example, shows that their TPP predictions were 4-points out on average, i.e. greater than the national Newspoll, and up to 9-points out in Queensland seats.
The Pollsters are Cheating
Fourth, it is unlikely that there was any ‘herding’ effect between pollsters to work towards the same result. They did predict much the same TPP outcome, but if there were any conscious attempt to replicate one another the primary votes would not have been different. For example, Ipsos got close to Labor’s primary vote but were out on the Coalition and Greens, whereas Essential both underestimated the Coalition and overestimated Labor’s primary vote.
It was the Voters’ Fault
Fifth, the idea of a late swing to the Coalition is very unlikely, at least in the sense of people converting from other parties at the last minute. The exit poll conducted by YouGov-Galaxy on the day actually showed Labor’s lead strengthened, a post-election poll by JWS showed no significant change in voting intention by date cast, and at least one party’s private polling is believed to have shown a change well in advance of polling day.
Finally, the idea that Australia has suddenly imported a ‘shy tory’ effect from the UK leading to fewer respondents admitting to voting for the Coalition seems illogical. The concept is questioned even in the UK, but it has never been seen in Australia before and our voters (many of whom are interviewed by computer, with no person to be shy in front of) have no issue in admitting to voting for other right-of-centre minor parties.
Methodological Explanations
So what did go wrong? Where should we look for the solution?
Good research is about asking the right people the right things in the right way, so here we must look at sample source and sampling, the interview methods employed, and the way that questions are asked. In particular, we should be looking at what’s changed.
Questions Asked
The questions asked by pollsters are probably the area to have changed least over the years simply because our voting system has remained constant. Each polling company has its own particular ‘house style’ when asking vote, but in truth they are all quite similar in their approach to asking people’s first preference vote on that day.
In doing so they are attempting to replicate as far as possible the real-world behaviour of casting a ballot on that day because (as I have written previously) the way in which something is asked and recorded influences the result, but arguably they do not get close enough.
For example, the actual ballot paper lists a series of candidates and their parties in a particular order specific to an electorate, voters are asked to number every square in order of their preference, and they are obliged to do so lest their vote becomes ‘informal’.
Most polls fail to emulate this process. They name and record only some of the parties, they often present those parties in random order or have the major parties first, they name only the parties and not the candidates and so dilute any incumbency effect, and they provide the option of ‘don’t know’ when this is never an option.
Most of the final polls at the 2019 election stated that they had excluded 5-10% of their sample who were undecided, and if the majority of them finally decided to keep with the status quo this could explain the election result on its own. This would not be a late swing, but a failure to capture which way those voters were likely to vote.
Screening & Weighting
Another questioning example that crosses over with sample is the way in which people are chosen to answer voting questions. It may surprise many that only around three-quarters of Australia’s adult population casts a valid vote after non-citizens, those not enrolled, those not turning out and those casting informal votes are omitted.
In effect, this means that Australia’s system is more akin to voluntary voting systems in other countries than we think, and holds the same uncertainties this brings into the system. Pollsters should filter these people from the equation, but it appears from stated methodologies that this may not always the case.
Again, including the theoretical voting intention of a quarter of the population that do not vote (including younger people who enrolled for the same-sex marriage vote, but then did not vote in the election) could easily account for a few percentage points of error.
Even when such measures are in place, polling company methodologies often refer to their samples being weighted to the population by age, sex and area. This is standard practice in general population research as applying weighting factors (and quotas) irons out any obvious demographic skews in a sample.
But here there is no frame of reference for what the three-quarters voting look like, so what are pollsters weighting to? If it is general ABS data for adults they risk weighting accurate results into inaccurate ones of the kind we are seeing.
Sample & Sampling
This brings us neatly to what is arguably the greatest challenge for modern pollsters: sample and sampling. As I have written previously, the main issue here is one of ‘reach’.
In brief, being able to measure the views of a population is largely contingent on being able reach every member of that population, and then randomly or purposefully sampling them in such a way that you can be sure of achieving a representative sub-grouping (in terms of geography and demographics, if not personality and lifestyle).
In the early days of political polling this was entirely possible. Interviewers would travel the country knocking on doors systematically to interview those at home. The cost and time involved, plus the growth in secure high-rise buildings, then prompted pollsters to switch to landline phone calls. This was the staple of polling until landline use declined to unacceptable levels and budgets became still tighter.
In other words, pollsters went from being able to target everyone to only being able to reach some of them. In such circumstances it does not matter how pure your sampling of that segment is or how much you weight the data to compensate because you can never fully understand those you cannot reach (this is before we even begin to consider the problem of non-response from those you can reach).
Interview Method
This is where the change in methods comes in. Face-to-face interviews and landline calls are obviously now not optimal, and where used may over-sample older people who tend to be right-leaning, so we have instead seen growth in on-line, social media, automated robo calls and mobile phone (call and SMS) surveys.
Whilst many of these newer methods are cost effective and acceptable for many polling applications, each has its own pros and cons.
For example, one might argue that on-line polls, such as those used by Essential, better emulate the private voting decision of a voting booth and are most convenient for respondents to complete at their leisure. But it is too often forgotten that on-line surveys rely on panels of willingly recruited respondents completing surveys for an incentive.
They are not and never will be a reliable means to survey the entire population, tend to be skewed to less affluent and left-leaning segments, can have panel members who are engaged and vocal on current affairs, and some suggest that regular exposure to surveys (especially tracking studies which repeat topics) can lead to morphed views. Social media surveys suffer similar disadvantages.
Mobile phone surveys are one of the best solutions to reach a majority, but they are more expensive if you use an interviewer. Cheaper robo and SMS telephone polling has grown significantly in recent years through organisations like ReachTel, but their interview length is limited, the quality of these samples can vary greatly and some argue that they lead to lower response rates across the industry through sheer annoyance.
In an attempt to reach a greater variety of people, some polling companies have begun weaving these methods together. For example, Newspoll (run by YouGov-Galaxy) use a combination of on-line surveys and robo calls, but optimal mix of these efforts is a point of constant debate.
Other Points to Consider
In addition to these methodological considerations, there are some other points that can neither be discarded out of hand nor empirically proven.
Gaming the System
First and foremost is the idea that some respondents to surveys are lying, or rather that a portion of voters may be ‘gaming the system’ in order to produce a result. The hypothesis is that some people see a poll going public and may choose to send a message of protest or a simple ‘call to do better’ via the media.
There is some circumstantial evidence for this. For example, in the last Federal election, in the latest NSW and Victorian state elections, and in the same-sex marriage vote the polls over-estimated the ‘mood for change’. That is, they predicted that there was a greater likelihood of change than there actually was in reality.
However, this remains very difficult to prove as one could argue that this reflected a last minute swing back to the status quo, including from those who were undecided before they cast their vote, or some other skew in the sample. It is also almost impossible to compensate for even if it were true. How does one detect a lie, especially one that may be the truth at the time of interview?
A Culture Shock
The second proposition being put forward is that published polling companies have become engendered with a political bias or a herding mentality through ownership, relationship or culture.
It has been widely reported that certain polling companies have merged, are owned by the union movement, have been registered at the address of politically-interested organisations, or have been publishing results in the media whilst also working for political parties or on politically-linked campaigns.
The implicit criticism is that such links will inevitably lead to a bias in research results or ‘groupthink’ between merged agencies. I am not sure this would always hold true – it is like saying that a criminal lawyer is naturally a criminal, or that that lawyer must think as their colleagues do – and would be fiendishly difficult to prove in any case.
We should also draw a distinction between ownership, staffing and therefore culture on the one hand, and working independently for particular clients and causes as part of a wider business on the other. To extend the analogy, the difference here is between a criminal lawyer representing clients, and a law firm being owned and run by criminals.
Researchers should generally be dispassionate about the topics they are investigating to avoid bias, and that means being politically neutral in published political polling. That is clearly put in danger by those owned and operated by political actors, but the question remains as to whether they are actively biased. If they are doing their job properly, they should not be.
In the case of those being commissioned to undertake work independently, my view is more relaxed. Research agencies generally work for a variety of clients across a range of sectors and with various ideologies and priorities, and they work to provide them with useful and objective results regardless.
To say that a researcher is immediately biased because of one client does not necessarily follow, especially if they serve a variety of interests or have demarcated teams. And in doing so perhaps they gain a deeper, better resourced exposure to the topic.
The Way Forward
This is the point in an article where I would usually outline my recommendations to solve all these issues, but the truth is that most of them have already been solved. To detail them would be to release the intellectual property of those who have had the sense, motivation and resources to fix the problem early.
So, whilst refraining from numerous detailed recommendations that should be the product of basic diligence and adequate budgets, I will say that there are two easy and obvious fixes that would benefit the industry immediately.
Easy Fixes
The first relates to sample. The ability to reach the vast majority of the population can be achieved through access to the IPND. This is a database of all landline and phone numbers in the country, tagged to location and person, administered by Telstra. It is by far the most effective and efficient way to sample the general population.
The IPND can be accessed by certain government agencies and for very specialized research tasks, but is currently out of reach for mainstream research. Access to this database would solve many of the uncertainties (noting that the Do Not Call requirement does not apply to research), but gaining access to it is far from assured.
The second, related action is a focus on mobile phone calls using trained interviewers, i.e. not landlines, automated robo calling or SMS messages.
This has become the gold standard to reach the most people in a way that allows for more questions and a higher response rate. Mixed methods still hold some interest in the absence of IPND, and single alternative methods are still fine for many applications, but mobile phone calls must figure prominently in any mix considered by published political polls.
There is a problem here. Most polling companies are now wedded or financially constrained to a particular methodology, and in many cases are the owners of the infrastructure. That is, polling companies own robo polling hardware or their own on-line panel, and the cost of switching may be just to great for them.
A Final Message
To take the theme of cost further, I agree with some in the industry who have said that the times of conducting ‘polling on the cheap’ are over. It can be done, but should not be done if research is to remain credible.
This applies just as much to the rest of the industry as it does to published political polling. You do your clients, yourself, your peers and the public a disservice if you compromise because having incorrect information is actually worse than having none.
If a client cannot afford quality research do not do it. It’s that simple.
Partner at Font Public Relations, founding Partner at Font Publishing, founding presenter at #fontcast, Partner at Enterprise Marketing and Research Solutions.
5 年Insightful article, thanks Jim
Early Intervention at DCJ | Secretary at NSW Gliding
5 年Great article Jim!