2016 Silver BlogTrump, Failure of Prediction, and Lessons for Data Scientists

The shocking and unexpected win of Donald Trump of presidency of the United States has once again showed the limits of Data Science and prediction when dealing with human behavior.
 
 



nytimes-upshot-forecast-trump-15 Just before the Nov 8, 2016 election, most pollsters gave Clinton an edge of ~3% in popular vote. Nate Silver' FiveThirtyEight put the chances of Trump Victory at ~30%, while NYTimes Upshot and Princeton Election Consortium estimated ~15%, and other pollsters had even lower numbers. Still, Trump won. So what are the lessons for Data Scientists?

Correct prediction is based on statistics and statistics requires history of similar events and assumptions like independent variables to function correctly.

If we toss a 100 million fair coins, we can predict the estimated number of heads and tails quite accurately. But using polling to predict the votes of 100 million people is much more difficult. Pollsters need to get a representative sample, estimate the likelihood of a person actually voting, make many justified and unjustified assumptions, and avoid following their conscious and unconscious biases.

In the case of US Presidential election, correct prediction is even more difficult because of our system when each state (except for Maine and Nebraska) awards the winner its votes in the electoral college, and the resulting need to predict results by state.

The chart below shows that pollsters were off the mark in many states, mostly underestimating the Trump vote. ,
Source: @NateSilver538 tweet, Nov 9, 2016

To be fair, some statisticians like Salil Mehta @salilstatistics were warning about unreliability of polls, and David Wasserman of 538 actually described this scenario in Sep 2016 How Trump Could Win The White House While Losing The Popular Vote, but most pollsters were way off.

So a good lesson for Data Scientists is to question their assumptions and to be especially skeptical when predicting a rare event with limited history using human behavior.

Other important lessons are
  • Examine data quality - in this election polls were not reaching all likely voters
  • Beware of your own biases: many pollsters were likely Clinton supporters and did not want to question the results that favored their candidate. For example, Huffington Post had forecast 98% chance of Clinton Victory.


Other analyses of polling failures: