A post-mortem on the 2016 election forecasts

By Dylan Freedman| December 6, 2016

Professor Sam Wang, head of the Princeton Election Consortium, lost a bet and ate a cricket live on CNN. Nate Silver, editor-in-chief of 538, wrote on Twitter that an article condemning his forecasting methodology was “so f---ing idiotic.”

In an election characterized by fiery debates, sexual scandals, FBI investigations, and a divided electorate, political predictors faced their own dramas when their models failed to foresee the outcome of the Nov. 8 presidential race.

“People blamed me for the entire outcome of the election, called for me to resign or be fired, and in several cases encouraged me to commit suicide,” Natalie Jackson, senior polling editor for the Huffington Post, wrote in an email interview.

Jackson, who was in charge of an election model that gave Democrat Hillary Clinton 98% odds of winning, reflected two days after the Nov. 8 election: “I truly did my best and read the data the way I saw it. No one wants to feel like this ― to be so utterly and publicly mistaken.”

That did not stop one stranger from emailing her out of the blue to suggest she receive “the tar-and-feather punishment.”

Silver faced criticism before the election, as his 538 model assigned Republican candidate and now president-elect Donald Trump the most favorable odds of any mainstream election forecast. Compared to other forecast models, 538’s public methodology is also significantly more complex, treating independent and mainstream candidates with different mathematical distributions and adjusting statewide polls based on proximity to election day and national trends.

Ryan Grim, the Washington Bureau Chief for Huffington Post, condemned Silver for “putting his thumb on the scales” the weekend before the election, when Silver’s forecast gave Trump a “heart-stopping 35 percent.” Describing his methods as “merely political punditry dressed up as sophisticated mathematical modeling,” Grim went on to explain how Silver’s adjustments to his model had no basis in statistics.

Dale Rosenthal, assistant professor of finance at University of Illinois, also chimed in via the Huffington Post after Silver had unleashed a storm of typo- and profanity-laden tweets in response to Grim’s article.

“I’m disappointed when meta-pollsters like Nate Silver start to also devolve into insults and yelling,” Rosenthal wrote hours before the polls closed on election day. “We should all admit that we are not experts but merely servants of this data which we are trying to understand as honestly as possible.”

Rosenthal responded via email that his article received varied responses after the election:

About one-third of the response was people saying things like: “Trump won so you are WRONG!”, “statistics has nothing to do with analyzing data”, “predicting elections is different.” Maybe 60% of the response was from people who said “538 was right so you are wrong!” which misses that (1) 538 jumped around a lot, (2) that 538 still predicted a Clinton win, and (3) that 538’s prediction has zero to do with whether 538 had the correct process. The remaining 5% or so was from people who have done modeling and had germane comments, questions, and critiques: mainly from engineers, biostatisticians, or people who do financial modeling.

Wang’s Princeton Election Consortium, which provides a forecast based entirely on polls with the goal of allowing people to channel their activism, has not faced much backlash, despite giving Clinton a 99% chance of winning the election.

“My correspondence has been largely supportive,” he shared in an email. “I think people understand that the calculations are only as good as the polls. Some are critical about the degree of confidence, which was based on an overestimate of poll accuracy.”

How does political forecasting work?

While the majority of the prominent political forecasters do not provide the source code necessary to reproduce their experiments, almost all rely on polling data and computer simulations.

Polls, the most public insight into the tendencies of prospective voters, give percentage leads to candidates and are conducted on a statewide basis or nationally, traditionally via telephone calls or online surveys. Using varying statistical processes, forecasters translate published polling data into likelihoods of each candidate winning each state.

The Huffington Post limits polls to only those that meet minimal eligibility requirements. 538 grades each poll from A to F based on its reputability, accuracy in the past, and a slew of other factors, then weighs the importance of each poll based on its proximity to election day and major events like conventions, then skews state polls slightly towards Republican candidates and national trends to “empirically” match historical trends.

Most forecasting models — like the Princeton Election Consortium’s and The New York Times’ — place great emphasis on unaltered polling data and rudimentary statistical methods, but 538’s model has many more moving parts.

Given the magnitude of different approaches, it is easy to see how debates among forecasters flare up.

To predict the national election outcome, a computer program runs through millions of simulations of the election, treating each voter as a random variable that behaves a certain way. Each candidate’s odds are the percentage of times she or he wins these simulated elections.

Some forecasts, such as the Huffington Post’s, correlate voter behavior at simulation time based on historical data, so a preference towards Trump in Michigan in one simulation could cause a Republican lean in Ohio as well.

According to Rosenthal, simulations may not be necessary. “There might be a way of just taking the state predictions and uncertainties, using the correlations between states' voting, and getting an overall prediction and uncertainty” — but no prominent political forecast did this.

Moving forward in a land of uncertainty

While the realm of probabilities offers no “winners” or “losers” — winning the lottery does not make one a better gambler than another — there are certainly lessons to be learned and applied to future political predictions.

With a swarm of articles decrying the polling industry after the election — like Politico’s “How could the polling be so wrong?”, Fortune’s “Polling and the Trump Big Fail”, and Newsweek’s “Polling Failure: Donald Trump Led for Just 192 Hours” — the fault has often been directed towards the raw data most forecasters relied on.

John Broder, editor of news surveys at The New York Times, shared his thoughts via email: “It’s very dangerous to make generalizations like calling this election a ‘polling failure.’ If anything, I’d call it a ‘forecasting failure.’ I think the forecasting business should be scrutinized.”

The national New York Times/CBS News Poll, which Broder’s news surveys team conducts jointly with CBS, predicted Clinton at +3 percentage points with a 3 percent margin of error four days before the election. The popular vote of this election, as of Dec. 3, places Clinton at +2 percentage points, well within this error.

The presidential race is not decided by the popular vote, however. Instead, the winner of the election is decided on a state-by-state basis according to the electoral college.

“State polls are difficult to conduct and were all over the map in terms of accuracy this year,” Broder reflects.

Indeed, some states are polled more frequently than others, polling methodologies vary between different agencies, and the practice is largely proprietary, like many forecasting methods.

Jackson looks towards a future that relies less on polling data. “I think that reliably forecasting the whole election was asking a bit too much from polls,” she commented in email.

Rosenthal, who has experience predicting financial markets, sees the power of alternate sources of data. “We should consider economic data and its possible effect on undecided and supposedly-decided voters. ... [In] finance we all start with the same data and you cannot make money without finding what other data helps you see more clearly than the competition.”

Rosenthal also draws from his financial experience on the issue of transparency. Though many may feel it threatens their job, he continues, “reputable forecasters should explain their methods, cite where those methods come from (so we can determine if they are nonsense or not), and explain why they do what they do.”

In all, this election provides a valuable learning experience and needed scrutiny into the methods that failed to predict it. Though it is still too early to tell exactly what went “wrong” in the polls and forecasts, it has been a wake-up call to political predictors.

It is conceivable why the public would lash out against its forecasters, who all gave less than a coin flip’s odds to now president-elect Trump, but it is important to remember that statistics make no guarantees.

Wang conceded moments before eating a cricket on CNN, “After all I was wrong, a lot of people were wrong. … I’m hoping that we can get back to data and thinking thoughtfully about policy and issues.”

Interactive elections forecaster

Republican candidate Donald Trump surprised pollsters and pundits last Tuesday by delivering a decisive victory in the presidential election on November 8th. Trump’s success is largely due to a dominating performance in the rust belt in which he flipped several key states that were predicted for Democratic candidate, Hillary Clinton.

Play with the interactive map below to explore the differences between the election outcomes and several popular political forecasts and polls.

Electoral vote distributions

Several models reveal the probabilities of each electoral vote outcome. Play with the chart below to explore these distributions and how they predicted the actual election outcome.

How does the interactive graphic work?

The Elections Forecaster primarily uses data from TheUpshot — which gathers predictions from The New York Times, FiveThirtyEight, Huffington Post, YouGov, PredictWise, Princeton Election Consortium, Daily Kos, The Cook Political Report, The Rothenberg & Gonzales Political Report, and Sabato’s Crystal Ball — and YouGov. The data was scraped from TheUpshot on November 7th and YouGov on November 6th and is binned into 7 categories:

50%-65%: Tossup (the outcome could reliably go either way)
66%-84%: Lean Dem., Lean Rep.
85%-95%: Likely Dem., Likely Rep.
96%->99%: Solid Dem., Solid Rep.

Sources which do not have percentage forecasts — namely YouGov, Cook, Roth, and Sabato — follow the same 7 categories and are matched into their corresponding bin. YouGov notably does not have any “Likely Dem.” or “Likely Rep.” categories.

The reported aggregate score “All models combined” is an average of each forecast result, treating each prediction on a 7-point scale.

Daily Kos and YouGov do not have additional data for the sub-districts of Maine and Nebraska, the two states which give a portion of their electoral votes to Congressional districts. For these two sources, it is assumed sub-districts have the same prediction as the entire state.

This model assumes that Trump won 306 electoral votes, as this corresponds to the outcome of the election if no electors had been faithless. No models took faithless electors into account.

The Electoral Vote Distribution graphic uses data from The New York Times, 538, Huffington Post, and PredictWise, as these sources were the only to make their simulation data public on their website or available upon request.