The last few days have been trying, mostly because folks keep asking me the same questions: have you voted? Who do you think will win the election? Do you think Nate Silver (http://fivethirtyeight.blogs.nytimes.com/) is right? How confident are you in the forecasts at _________ (fill in your favorite election forecast site)?
http://votamatic.org/ (Drew Linzer from Emory) or
http://www.huffingtonpost.com/simon-jackman/ (Simon Jackman from Stanford);
For those of you don’t know me, I teach political science at a school where most of my students are very engaged. When my students asked me some of these questions, while I was teaching, they were surprised to hear me say I had full faith in the 538 algorithm, as well as a number of others, who were predicting Obama would be re-elected at roughly 10:1 odds. When a student told me that Obama’s stock was trading at $6.45 at around 2pm, I immediately told the class to “buy buy buy!” The day after the election seemed to be the only day quantitative political science was relevant; I guess I should enjoy it while it lasts. What worried me more was the skepticism in all of these election forecasts.
Most of my students and my fellow colleagues didn’t believe me, and I’m still not sure why. Yes, the election is a big deal. Yes, people get invested in which party/candidate wins. Yes, whoever takes office influences the lives of millions. Yes, even I voted when I know the chance my vote matters is infinitesimally small (http://goo.gl/ob221). I understand all of that, but what I don’t understand is why folks are simultaneously enamored and skeptical of 538, mostly because the method itself is not all that complicated. Moreover, and in the interest of full-disclosure, I count myself as one of the many (political scientists) who has “Silver-envy”, wishing we could be hailed as one of Time’s Most Influential People, meet Jon Stewart or Colbert, etc. for doing relatively straightforward statistical analyses.
However, Mr. Silver doesn’t have it all so easy. As we’ve seen recently, other pundits have attempted to sully his reputation; either by saying his methods were biased or that he’s probably some sort of witch (hilarious links below):
As much as folks fought cognitive dissonance (e.g. Republicans who knew Romney was going to win and were convinced the polls were lying; Democrats who were frightened out of complacency because they knew the election was very close), Mr. Silver and the others were right. If you had an idea of how, exactly, the estimates Mr. Silver et al. produced, you might have been able to relax and accept the most likely outcome more easily. Did anyone notice that President Obama stayed in Chicago all day while Romney ran all around the country? He even played a game of hoops with Scottie Pippen (http://goo.gl/vDqUj). That should have been a dead giveaway: anyone who was paying attention to the numbers thought the election was in the bag and there was nothing left he could do for the campaign; why? It’s almost as if we need a reminder that, as one of the greatest minds of our generation put it, “men lie, women lie, numbers don’t.”
The two most common tools we use in introduction to Probability/Statistics classes are the “fair” die (an equal probability of rolling a 1,2,…,6) and the “fair” coin (a coin with a 50-50) chance of turning up heads or tails. We use these as examples because most everyone’s familiar with them. If we assume the coin is fair, what is the probability that a single flip results in a heads?
1/2 (50%, as both sides are equally likely to come up)
If we assume that the die is fair, what’s the probability of rolling a 6?
1/6 (about 16.67%, as any of the six possible outcomes are all equally likely)
Now, what if I ask you to calculate the probability of two events occurring; say, the probability that the coin will come up heads AND the die will come up a 4? If you assume the evens are independent of one another – that the result of the coin flip or die roll has NO EFFECT on the other – the answer is straight-forward; you simply multiply the two probabilities:
½ x 1/6 = 1/12 (approximately 8.33%)
Since this is a blog about statistical software, we can even write a little function that does this a large number of times to verify the result:
We can see, after 1000 simulations we have 84 with the result we were looking for (Heads, 4). 84/1000 is 8.4%, not too far off from what we would expect if we assume these things are independent. Again, this appears to be a fair assumption. The next question is whether people are that different from coins or dice.
Independence (Now with examples you might be interested in)
Imagine you observe the following two professional athletes: the PGA’s Tiger Woods and LeBron James of the NBA. Imagine Tiger is teeing up a drive at the same time LeBron is taking some free-throws:
Question 1: What is the probability of LeBron making the free-throw?
Question 2: What is the probability that Tiger’s drive lands in the fairway?
Well, according to the NBA’s statistics (http://goo.gl/5q7AE), Mr. James made 387 (77%) of his 502 free throws last season. According to the PGA’s numbers, Mr. Woods has hit 178 (49%) fairways over 468 holes in the 2011 season.
If we take these numbers are absolutely certain, then our answer to question 1 is simply 0.77. We would guess that James makes his free throw about 77% of the time. Another way of thinking about that is that if we were to observe a random sample of 100 of his free-throws, we would expect to see him make about 77 and miss 23. In the simplest way of thinking about it, you would probably bet that James makes the shot.
The answer to question 2 is also easy: we would be inclined to think Tiger would not hit the fairway, after all, we would guess than in 100 drives he’d only land in the fairway 49 times on average. So, not knowing anything else, if someone asked you to bet on any single one of Tiger’s drives, you’d probably bet against him hitting the fairway; sorry, Tiger.
Now, what about the following intersection of events: what is the probability that Tiger hits the fairway (FW) AND LeBron makes his free-throw (FT)? Again, if we assume these are independent events (that nothing is causing both of them to be systematically related), the answer is straightforward:
P (Lebron Makes FT and Tiger Hits FW) = P(Lebron Makes FT) x P(Tiger hits FW)
P (Lebron Makes FT and Tiger Hits FW) = 77/100 x 49/100 = 0.377
Thus, we would expect that about 38% of the time both things would happen; this is what is known as a joint distribution. Do we get a different answer if we simulate this using a computer?
In a single simulation of 1000 of the events, we had 390 instances (39%) where both of those things happened, only 10 off what we would expect. What these examples have shown is that we can use computers to simulate events happening, do this a whole bunch of times, and have some level of confidence in the estimates. What 538 is doing with political campaigns is not all that different, but it uses something that is intuitively attractive; Mr. Silver uses poll weighting based on the poll’s past accuracy or reputation. I will give you one more example from the world of free-throws.
Why More Data is Always Better
Imagine you are a basketball coach who is brought a new recruit who has never played organized basketball before. No one has been keeping his FT percentage, because he’s never played. The agent who brought him in says, “my client is a better three-point shooter than J.J. Redick,” (one of the NBA’s better three-point shooters, making about 42% last season). As you skeptically stare at this prospective player, his agent tells him to show you how good he is. On command, the free-agent shoots three baskets, from a couple of different places around the key, making two of three. Obviously, 66% is much better than even the NBA’s best three-point shooters. If you were only looking for someone who could shoot that well, and disregarded every other aspect of his game (stamina, speed, agility, court vision, etc.), how many three-pointers would that person have to make before you were convinced he was really a 66% three-point shooter? If you have money riding on this, the answer should be a lot; how many though?
Well, each shot he takes improves your assessment of his overall performance. If we assume that there’s a one in three chance that he will miss any single attempt, we can simulate how your confidence in the estimates changes as you gain more information. The results for the first 50 shots are shown below:
The lines through your point estimate represent some measure of uncertainty (let’s just leave it at that to keep it simple). The big concept is that the more data you have the more certain you become about the value of the variable you’re interested in (this guy’s 3pt Shooting Percentage). In our simulation, even after 25 shots he seems to be performing better than the league leading Steve Novak (the solid line at 0.47) at a rate that is unlikely to be due only to chance.
Over 500 shots, this is what happens to your confidence. Again, these estimates are from simulation and the code is available above:
As you can see, after 100 shots you are pretty much “sure” that he’s better than the best in the NBA (Novak), and your estimates don’t really change too much as long you have a large number of data points and they are relatively stable. You can update your estimates (as all statisticians do), but at some point you can infer what is likely to happen. That last point is key: you can infer what is likely, but not what is certain.
Elections Aren’t That Different from Free-Throws…
…or anything we have so much data on.
Mr. Silver’s approach is similar to the coach who is updating his perceptions of the person shooting three-pointers: over time he can become more confident in the estimate he’s producing. The analogue to our basketball coach is this: imagine that the recruit comes with three letters of endorsement. The first is from Duke’s Coach Mike Krzyzewski (we are Duke Ph.Ds, after all). Coach K praises your recruit for being possibly the greatest perimeter shooter ever. He tells you that he believes this based on running into him on a London basketball court during the Olympics, watching him shoot around for an hour, and inviting him play in a pickup game against some of his own players from Team USA. Coach K says he’s the “real deal, ready for the NBA right now; I’d give him a 10/10.” The second is a letter from the recruit’s math teacher. He tells you that in high school the recruit was good at math and is only so-so at basketball; the math teacher saw him practicing his three-pointers for 10 minutes one time. In the math teacher’s opinion, “he’s just an average player, not so great at geometry and his calculus homework was just derivative; I’d give him a 2/10 in basketball.”
Whose opinion do you weight more heavily? Turns out most people are persuaded by people with authority or knowledge about the topic at hand. You might discount the assessment from the math teacher in favor of Coach K, whose knowledge about basketball is more likely to be useful in making a decision about a basketball player. If you think about it, you have three options:
Option 1: Weight both equally: the player is maybe above average ½ x (2/10) + ½ ( 10/10) =6/10 = 0.60
Option 2: Weight the teacher more heavily (say, twice as heavily):
2/3 x (2/10) + 1/3 ( 10/10) = 14/30 = 0.467
Option 3: Weight the basketball coach more heavily (say, five times as heavily):
1/6 x (2/10) + 5/6 ( 10/10) = 52/60 = 0.867
If you have previous knowledge of how good the source of information is, you can come up with some scheme to weight the data they give you. This is kind of what 538 does; again, the point of this post is to simply explain why you might want to trust the results from 538 and other websites that use model averages to predict things.
From what I can tell, Mr. Silver does this by doing something similar to the following:
- Finding which polls have been most reliable in the past (like Coach K compared to the math teacher);
- Deciding based on their past performance how to weight the information each poll gives him (like we did above);
- Once he has a belief about the most likely outcome of two states, like our estimates of Tiger Woods hitting the fairway and LeBron James making a free throw, he programs the computer to simulate outcomes (as we did above).
- Just like in our Woods/James example, we get answers that are close to what we would expect. However, instead of doing this for only two independent events, 538 uses the polling data from 51 separate events (all 50 states plus DC). As each is independent, you can come up with some estimates of how likely some set of outcomes is.
- If each of these outcomes corresponds to a number of, say, electoral votes, you can make some pretty good predictions.
Penultimate Example: A Three State Election:
Imagine the following election with only three states. Let’s assume that we are certain that the probability of each state going to Obama is the following:
Ohio (18 Electoral Votes) – 50%
California (55 Electoral Votes) – 90%
Alabama (9 Electoral Votes) – 10%
If we use our knowledge of independence of events, we know the probability that Obama wins all three states is:
.50 x.90 x.10 = .045
That means, we’d expect 45 times in a thousand that he’d pick up all three states. What do we get when we simulate these results?
We got 44 out of 1000 with this… pretty darn close to what we’d expect. If this works for 3 states, why wouldn’t it work for 50 (+ DC).
Now You Can Simulate Elections Too!
Using the estimates available from yesterday morning’s 538 estimates of state-by-state polls, I have recreated a quick algorithm to replicate Mr. Silver’s findings.
It uses the same principles we’ve talked about all along: calibration based on reputation of the poll and previous polls and independence (the notion that Tiger Woods doesn’t affect LeBron James any more than a random voter in Ohio influences a random voter in California). While it’s really cool, it’s only basic probability and Monte Carlo simulation.
Here’s my R script which produces a graph based on N simulations. Once you load the script, just run:
Where N is an integer for how many simulations you want to run.
The graph, shown below, gives you the modal outcome of the electoral distribution as well as the overall probability that Obama would win the election. While it has more moving parts, it is really no different in principle than Tiger and LeBron. Again, this is not an exact replication of the algorithm Mr. Silver uses, but it’s just one way of simulating the distribution of an outcome of interest (electoral votes) based on some information you have (state by state polls).
Play around with the number of simulations to see how things change; enjoy!
Conclusion and Disclaimer
As I stated at the outset of today’s long post, I think the fervor over any of the forecasting models (Silver/Linzer/Jackman) is rooted in a fundamental misunderstanding of what probability can tell us. It’s not that any single person is a statistical genius, but it’s that the numbers that we collect can help us predict things. Drew Linzer’s explanation is great, and for the record, his two paragraph introduction to his final prediction could serve as an advertisement for why students might want to take basic statistics. Some other advanced forecasting models can be found here:
The takeaway is this: close, qualitative observation of the race can give you lots of insight, knowing the opinions of your social network can be informative (though see below for a bad example), but the polls are designed to gather as much unbiased information as possible — no one person’s observations are as clean a measure — so when we put a bunch of nearly-unbiased polls together, further removing their individual biases, we should not be surprised to get accurate results. If, for instance, you just use something like Facebook users or tweets, you might get sub-optimal forecasts:
Again, this post is basically just a way of illustrating what we can do with a relatively small number of data points. I am sure that I have made at least one mistake somewhere in either my discussion of statistics, inference, code, etc. The real methodologists out there might get upset; please don’t. I am only trying to illustrate how something people think is “magic” is actually not all that different from things with which we are already familiar (e.g. dice, coins, free-throws). As someone who tries (and mostly fails) to be a methodologist, the simple link between basic probability and predicting the presidential vote is what makes all of these forecasts so awesome. Finally, if you’ve learned anything from this post, please pass it on.