Why Were the 2016 U.S. Presidential Polls Wrong?
The 2016 U.S. presidential election is over. And if you’d been following the polls, the results might have been surprising. What went wrong with those polls? It has to do with statistical and systematic uncertainties.
Unless you’ve been living in a deep-sea submarine, trekking through the rainforest, or perhaps traveling through a magical space and time portal you stumbled upon in your grandmother’s closet, you know that the 2016 presidential election in the United States came to an eventful conclusion this week.
And if you had been paying attention to the election—in particular the polls—during the weeks and months leading up to election day, you might very well have found the result to be a wee bit surprising. Because the polls consistently told a different story than the one that played out.
Or did they?
Were the polls really wrong? Or were they just really uncertain? Did the polls fail to correctly predict the future? Or was it a mistake for people to think that predicting the future is something that polls can do? The answers to all of these questions is “Yes.”
The reason for all the misconceptions about the polls tied up in these questions is a fundamental misconception about statistics. Namely, what do statistics actually tell us? How certain should we be about them? And how exactly can they lead us astray?
What Do Polls Actually Predict?
During the last few election cycles, an industry has developed around compiling polling data into models which are used to predict the likelihood of various outcomes. But what exactly do these statistical models tell us? In particular, I’d like to talk about what it meant that Nate Silver’s fivethirtyeight.com gave Hillary Clinton a 71.4% chance of winning and Donald Trump (the eventual winner) “just” a 28.6% chance of winning.
A lot of Clinton supporters that I know were relieved to see her chance of winning climbing so high. When I asked them why they felt so good about it, they replied that a number over 70% made it feel like the odds were in her favor—that it was essentially a “sure thing.” But if you stop and think about it, you’ll realize that a 70% chance of winning isn’t actually all that great. And a 30% chance of winning isn’t actually all that bad (as we found out).
A 30% chance of winning isn’t actually all that bad.
After all, if a baseball player succeeds at the plate 30% of the time (a 0.300 batting average), they are deemed to be an exceptional hitter. Perhaps more strikingly, the chances of tossing a coin and getting heads twice in a row—a fairly mundane achievement—is 25%. So according to Nate Silver’s model, Donald Trump’s chances of winning were greater than the chances of you tossing heads twice in a row. When you think about it that way, you see that his odds weren’t actually so bad and the outcome wasn’t particularly unexpected—the eventual winner simply tossed two heads in a row.
Are Models and Statistics Useless?
In the aftermath of the election, a lot of people have been saying that all of these models must be wrong and perhaps even useless since they predicted the wrong winner. But that point of view is misguided since it stems from a fundamental misunderstanding of what statistics tells us. Namely, a statistical value such as a “chance of winning” isn’t a concrete prediction about the future, it is simply a statement about the frequency with which some outcome will happen.
Here’s a good and literally “out there” way to think about this: If there are 99 other parallel universes that have been identical to our own in every possible way right up until voting in the 2016 U.S. presidential election began, the statistical statement about the chances of each candidate winning the election made by fivethirtyeight.com means that Hillary Clinton won in about 70% of those universes. We just happen to live in one of the other 30% in which Donald Trump won.
So models and statistics aren’t useless, but it’s important to keep in mind that they don’t predict the future; they tell us about the likelihoods of the various possible futures in light of all the various uncertainties present in the world. And sometimes the future that appears less likely to happen does happen.
Statistical Versus Systematic Uncertainty
But nonetheless, if the polls all seemed to point towards a different outcome than the one that eventually happened, the polls must have been flawed in some way … right? Indeed, by definition polls are flawed. After all, the only way to truly know what will happen in an election is to run the election and count all of the votes cast by the entire electorate. A poll is designed to approximate the election by querying only a limited sample of voters—so a poll will always have limited information and thus it will be uncertain. Note that I’m not saying that the polls were “wrong,” they were simply flawed in that they carried with them some unavoidable amount of uncertainty.
Getting a bit more specific, there are two different ways in which polls can be uncertain. First, let’s say you’re trying to figure out how a population of 1,000,000 people is going to vote. If you decide to randomly sample only 10 out of those 1,000,000 people in a poll, statistics tells us that the answer you get is going to be very uncertain (since you might get unlucky and randomly select a non-representative subset of the population). If you instead sample 100 or better yet 1,000 people (or more), the answer you get will be more certain in the sense that you can be more confidant of the polling result. Thankfully, this so-called statistical uncertainty in polling is fairly easy to understand and account for in modeling.
Systematic uncertainty is much more subtle and difficult to deal with.
The other type of uncertainty inherent in polling—so called systematic uncertainty—is much more subtle and difficult to deal with. This uncertainty arrises from the fact that the sample of people you poll is not truly random (and thus not truly representative of the population). For example, if you’re conducting a telephone poll, you have to be very careful to call a representative subset of the population. But that’s extremely difficult (or probably impossible) to do since not everybody is equally likely to answer their phone. Which means that pollsters have to account for this by making extrapolations based upon models of the population. And even if you somehow managed to sample a perfectly random subset of the population, the different groups of people within that subset are not equally likely to actually go out and vote on election day. So pollsters have to model that sort of behavior as well.
Each of these potential sources of bias in the sample and the modeling and extrapolating necessary to account for them is a source of uncertainty in the polling. And these kinds of sources of uncertainty often introduce a systematic offset in poll results by a few percentage points for one candidate or the other.
How Close Were the Polls to Being Right?
Dealing with uncertainties like this is exactly what a model like fivethirtyeight.com’s is designed to do. In fact, the point of such a model is to give us insight into how uncertainties in polling data—both statistical and systematic—influence the probabilities of the outcomes. And, despite the cries of people that all of the models got the election wrong, the 30% “chance of losing” that goes along with a 70% “chance of winning” means that at least some of the models actually got the election right—not in the sense of predicting the winner, but in the sense of making it clear that the outcome was anything but certain.
The election wasn’t quite a tossup, but it wasn’t far from it. And the polls and (at least some of the) models indicated as such. As Nate Silver has pointed out in recent days, if only 1 out of every 100 voters opted for Clinton instead of Trump (especially in a few swing states), the resulting 2% change in the popular vote would have completely swung the election the other way and we’d now be talking about how spot-on the models were. The simple truth is that the world is complicated and difficult to predict, and polls and mathematical models are the best tools we have to navigate the inherently probabilistic landscape of life.
Wrap Up
Okay, that’s all the math we have time for today.
For more fun with math, please check out my book, The Math Dude’s Quick and Dirty Guide to Algebra. Also, remember to become a fan of The Math Dude on Facebook and to follow me on Twitter.
Until next time, this is Jason Marshall with The Math Dude’s Quick and Dirty Tips to Make Math Easier. Thanks for reading, math fans!
”I Voted” image from Shutterstock.
You May Also Like…