Skip to main content

When FiveThirtyEight editor Nate Silver predicted in June that Donald Trump had a 20 per cent chance of winning the US presidential election against Hillary Clinton, eyebrows were raised. Just days before, experts had made a similar prediction about the chances of the British electorate voting to leave the European Union. Betting markets had the odds of a ‘Remain’ win as 4 to 1 in favour, and yet ‘Remain’ lost.

Heading into the final presidential debate tonight with just an 11% chance of winning the election (according to FiveThirtyEight’s latest forecast), Trump will no doubt be hoping to upset expectations on 8 November. Certainly, when he won the Republican nomination, CNN carried a headline which said that Trump had “defied all odds”. That is a well-worn phrase, but it is somewhat misleading in the context of probabilistic forecasts of election outcomes. If Trump has a 1-in-10 chance of becoming president, we should expect that outcome to occur one time in 10. Intuitively, this feels wrong. We don’t run elections multiple times: it’s a one-and-done sort of thing, so people naturally expect the candidate with the best chance of winning to be the ultimate victor. But that is not how probability works.

The probabilistic forecasts on which figures like these are based come from simulations of elections run many times through a statistical model. A 10 per cent chance of winning typically means that, out of 100 simulations, a candidate won 10 of them. This win percentage has to do with the strength of a candidate’s polling data, and the uncertainties that are factored into the model. Let’s say we have 20 polls, each of which finds Clinton with an expected vote share of 55 per cent, plus a further 10 polls that give Trump a similar lead. At this point, it’s almost impossible to know which set of polls is right, but we can make informative use of both as long as we understand the sources of error and uncertainty in the polls.

Some of the uncertainty has to do with the method of polling used – whether online, in-person or on the phone. Other errors can be caused by the choice of who to interview, and whether everyone you select agrees to be interviewed. But whatever the uncertainties, if properly included in the model the simulations can account for them and run multiple ‘what if?’ scenarios to cover all likely eventualities and all possible outcomes.

Based on what we currently know, then, a Trump victory is possible – though it might seem increasingly unlikely. But what if there is more uncertainty in the polling data than is properly accounted for in the model? What if our forecasts fail to account for information that might make a Trump win much more likely?

Here, we are not only worried about uncertainty in our underlying data, but uncertainty in our predictions. To illustrate, consider two examples: (a) a person waking up from a coma and correctly guessing that it is a Monday, and (b) another person accurately predicting that it will rain in London, given that the weather forecast has assigned a 15 per cent probability of showers. In both scenarios, the person making the prediction has a (roughly) 1-in-7 chance of guessing correctly, based on what they know. However, there is more certainty in (a)’s prediction than in (b)’s.

In (a), there are exactly seven days in a week for the person to choose from, so the 1-in-7 chance of guessing is a fixed probability. In (b), however, the unpredictability of short-term weather forecasts leaves greater uncertainty.1 Weather forecasts do not offer a fixed probability as many parameters affect the outcome of the weather, and each of these parameters is uniquely variable. Like weather forecasting, opinion polling cannot produce outcome probabilities with exact, fixed precision. In the US presidential election, there may be a set number of candidates to choose from, but the forces acting on voters are many and varied, and not all are particularly well understood. That is why, in assessing any candidate’s chance of success, it is important to consider the uncertainty of model estimates, as well as the uncertainty of the underlying data.

The problem with polling
In standard election polling, pollsters can minimise and account for uncertainty in a number of ways. They may draw on years of prior experience estimating the levels of support for the various political parties, as well as their understanding of the nuances of specific election systems – for example, the electoral college system in use in the US. Pollsters also have previous election data and polling data they can use to validate their methodology.

The presence of such safeguards does not mean that they get it right all the time. This was evident in the UK general election in 2015, which was projected to be a closely fought race between the Conservative and Labour parties. In actuality, the Conservatives won by 6 percentage points. More recently, the 2016 US Democratic primary in Michigan was won by Bernie Sanders, although it was widely expected that Hillary Clinton would win by 18 percentage points or more.

But if election polling is hard and prone to error, referendum polling is worse. Referendums, like the UK’s recent ‘Brexit’ vote, are predominantly one-off occurrences and pollsters often have little to no experience (and data) to draw upon. There are past referendums to look at and learn from, but every referendum is different. As such, voting behaviours can be expected to vary between referendums due to the issue-specific nature of such votes.

Consider how, in the week leading up to the Brexit referendum, 8 of 12 polls indicated a vote in favour of ‘Remain’, while a week earlier just 4 of 14 polls indicated that ‘Remain’ would succeed. These are highly variable results over a very short period, and even with this uncertainty, some pollsters made dubious assumptions about how people might vote, such as splitting the ‘undecideds’ in their polls two-to-one in favour of ‘Remain’, as has been standard practice in election polling.2

From a pollster’s perspective, it could be argued that a presidential candidate like Donald Trump is similar to a referendum issue. He’s a one-off: there is little prior experience to draw upon in quantifying the appeal he has to voters and which groups are likely to vote for him. Pollsters can try to capture the views of the population in a truly representative manner by obtaining information from across the population to ensure that the full spectrum of voting preferences and attitudes are covered. They can also use a variety of statistical adjustments and corrections to try to ensure a representative sample.

However, survey response has been declining precipitously,3 and finding people to participate in polls is increasingly difficult – with response rates between 20 and 30 per cent. Like poll participation, voter turnout is far from universal. Some groups of people are more likely to vote than others. For this reason, pollsters not only ask about vote intentions, they also ask about likelihood of voting to try to get a better read on how actual voters will vote. But people don’t always do what they say they’re going to do, which means that statistical adjustments may prove ineffective. Those polling the 2016 UK general election thought they had their adjustments right, but there were too many young people in their samples and not enough older people, and while the former were more likely to say they’d vote Labour, they were also less likely to go out to vote.2

Quantifying the uncertainty in the US polls
It is far too late to fix the problems with polling in time for the US presidential election – but these problems can be managed through well-designed meta-analyses and probabilistic models of the kind being produced by organisations such as FiveThirtyEight, the Princeton Election Consortium, the New York Times and PredictWise. Any single poll will be a snapshot in time, and will necessarily be fraught with uncertainty due to sample size, non-representativeness, and differences in methodological assumptions. For this reason, a single poll should be interpreted very cautiously; even more so in situations when there is little substantial experience to draw upon – say, in a referendum, or when Donald Trump is running for president.

But by using a variety of meta-analysis techniques to pool different polls, as well as betting markets data and other sources of information, it is possible to arrive at more accurate estimates and forecasts. Combining data spanning different times, geographies and, most importantly, different people should lead to aggregate polls that are better representations of the overall population’s sentiments.4 Moreover, a well-constructed and justified meta-analysis not only reduces uncertainty but allows for much better diagnostics of the variability in their own predictions, much more so than a single poll.

There is a vast literature on combining forecasts to reduce variability in forecast values – this is very similar to the notion of risk diversification in finance.5 To illustrate the power of this approach, we offer a simple meta-analysis of opinion poll data for illustration purposes, the results of which are shown in the graph below. These polls asked potential voters to state their preferred candidate for president – Donald Trump or Hillary Clinton. It is important to note that the majority of these polls ask questions along the lines of, “if the presidential election were held today, and the candidates were Donald Trump and Hillary Clinton, for whom would you vote?” (as per the Washington Post-ABC News Poll). Perhaps reflecting the variable fortunes of both candidates in the media, there are swings, slumps and surges in the mean estimate.

The margin of error is a way to describe the uncertainty of an estimated quantity, and is often used to describe the variability of polling figures. Generally, it is defined as half the width of a 95 per cent confidence interval. You will frequently see it reported as the estimate plus or minus a value, for example 40 ± 3 per cent.

In statistics, bootstrapping refers to any test that relies on repeatedly taking random draws from the sample (with replacement) as a way of obtaining information about the population. Doing this resampling from the same sample over and over again provides information about the (unknown) structure and distribution of the population.

We obtained publicly available data on election polls conducted between January 1, 2016 and July 17, 2016. There were 113 polls, conducted by 26 companies, during this period. Each poll had varying sample sizes, ranging from 800 to 16,135. In addition, the methodology of these surveys is diverse, from online surveys of panel members and non-panel members, to telephone surveys of landline and mobile users (and a mix of both), as well as face-to-face surveys (although these are increasingly rare). They also have differences in sample recruitment, but the majority of these polls apply randomisation and targeted selection to meet pre-defined quotas, to ensure representativeness of the voting public. We assume, as is the general practice, that final poll estimates were weighted to ensure that the results match the demographic profile of likely voters. Using ‘likely voters’, not the entire voting age population, is an important distinction due to differences in voter turnout. These weighting adjustments are important to achieve the most accurate polling results, so – bearing in mind that election polling is a business primarily – the methodology for weighting is sometimes seen as commercial-in-confidence. However, the polls also provide margins of error (see sidebar), which gives an indication of the accuracy of the polling results, based on their own data, assumptions and methodology.

Our methodology is fairly simple. We take the mean of these predictions, contrasting preferences for Trump and Clinton. These means were fitted with a smoothed function weighted by each survey’s sample size. The averages for Clinton and Trump are shown in the graph as solid lines. To obtain an estimate of uncertainty we use the reported margin of error, and use a bootstrap procedure (see sidebar) with 10,000 replications to generate a 95 per cent confidence interval for the smoothed averages, presented as shaded regions around the smoothed estimates. This highlights the uncertainty of the average outcome, but does not incorporate all uncertainty in survey representativeness, quality of responses, and the model itself. The aforementioned meta-analyses, produced by FiveThirtyEight et al., have remarkable details built into their models to incorporate many different sources of uncertainty.

Our results show that while Clinton has enjoyed a larger vote share since early mid-January, this vote share appeared to be diminishing in early summer, and the difference between Clinton and Trump was getting smaller – supported by the overlapping confidence intervals in the late-July polls. The latest polling data looks very different, of course, but the graph above still demonstrates how meta-analyses can be powerful tools with which to account for uncertainty in election forecasts – though, the validity of their outputs still depend on the polls being broadly right. “Garbage in, garbage out” is a phrase all statisticians are familiar with, and it means that if the underlying polls are fundamentally wrong about either Trump or Clinton’s support, then the results of the meta-analysis will be similarly skewed.

It is clear we are in unprecedented territory for polling: as Nate Silver once observed, nobody remotely like Trump has ever won a major-party nomination in the modern era. While he might seem to “defy all odds”, as CNN claimed, the simple truth is that Trump’s chances of becoming president are difficult to measure. After Brexit, and no matter what the US presidential election brings, it should be obvious to all of us – pollsters, journalists, political pundits and the public – that we need to become a lot more comfortable with uncertainty and how we account for it in our forecasts. What is clear also is that methods of polling have become antiquated, and we need to develop better, more responsive ways to deal with the biases that may affect our understanding of voting preferences, intentions and behaviours.

  • Bernard Baffour is a research fellow in social statistics and Joshua Bon is a statistical research assistant, both at the Institute for Social Science Research at the University of Queensland, Australia.

 

Asking a different question
Pollsters could try to answer the question of who voters will vote for by not asking the question directly. Instead, they could build up a picture of the voter’s views on certain issues to see which candidate is a better fit. As described by James Cochran and David Curry,
writing in Significance in October 2012: “One way to do this is to use a modern technique known as discrete-choice modelling. This is a method which reconstructs a voter's overall utility for a candidate from the voter's feelings about a candidate's positions on issues. (Utility is an economists’ term for the value a consumer associates with purchasing a product, but it works just as well to index the benefit – the positive feelings, the general satisfaction – that he or she associates with voting for a candidate.) The model assumes a voter will cast her or his ballot for the candidate whose positions generate the greatest utility for the voter.”

 

References

  1. Speigelhalter, D., Pearson, M., and Short, I. (2011). Visualising uncertainty about the future. Science, 333, 1393-1400. ^
  2. Sturgis, P., Baker, N., Callegaro, M., Fisher, S., Green, J., Jennings, W., Kuha, J., Lauderdale, B., Smith, P. (2016). Report of the Inquiry into the 2015 British general election opinion polls. London: Market Research Society and British Polling Council. ^ ^
  3. Groves, R.M. (2011). Three Eras of Survey Research. Public Opinion Quarterly, 75(5), 861-771. ^
  4. Rothschild, D. (2010). Debiased Aggregated Polls and Prediction Market Prices. Chance. Vol. 23, No. 3, 6-7. ^
  5. Cochran, J.J., Curry, D.J., Radhakrishnan, R., and Pinnell, J. (2014) Political Engineering: Optimizing a U.S. Presidential Candidate’s Platform. Annals of Operations Research, 215(1): 63-87. ^

 

Leave a Reply