Site icon R-bloggers

A Bayesian election prediction, implemented with R and Stan

[This article was first published on Revolutions, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

If the media coverage is anything to go by, people are desperate to know who will win the US election on November 8. Polls give us some indication of what's likely to happen, but any single poll isn't a great guide (despite the hype that accompanies some of them). One poll is subject to any number of possible errors, statistical and otherwise: the sample, the methodology, the analysis, or even deliberate bias.

But put a whole bunch of polls together, and you can assemble a more realistic picture of the likely outcome, and the uncertainty associated with it. That's what poll aggregators like FiveThirtyEight do: take state and national polls, etimate pollster biases and correlations between states, incorporate other influential variables (like economic data) and build a statistical model to forecast the number of electoral college seats won by each candidate (which in US elections, is pretty much the only thing that matters). FiveThirtyEight's methodology is a sound one, and has been largely successful at predicting elections, but the actual details of the process they use is secret, and is thus itself subject to accusations (unfounded, IMO) of bias.

A new election forecast by Pierre-Antoine Kremp, uses a similar (but Bayesian) process to predict the election, and all of the methodology is transparent and open. The forecast is implemented in the R programming language and Stan, the Bayesian computation engine. (The new model was introduced by Stan author Andrew Gelman on Slate, which hosts its forecasts.) All of the data, code and the generated report are available to inspect on GitHub, and the statistical methodology is included with every forecast (scroll down to the Model section). The model itself is based on the Votamatic model by Drew Linzer, which was very successful at predicting the 2012 election. As of this writing on November 2, the model predicts a win by Hillary Clinton with a probability of 88%. (FiveThirtyEight's polls-only forecast gives her a 69.9% chance.)

One interesting chart included in the report is the state-by-state probabilities of winning. Nothing shows how divided this country is than how few states are actually competitive at all:

To check out the latest forecast from Kremp's model, follow the link below.

Slate: State and National Poll Aggregation

To leave a comment for the author, please follow the link and comment on their blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.