Votamatic predicted the election with R
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
While Nate Silver got a lot of the attention for correctly forecasting the US presidential election, other forecasters were just as succesful. Drew Linzer used the R language to build the statistical model behind votamatic.org, and was able to predict the outcome of the election months before most pundits.
Drew's model initially relied mostly on fundamental quantities: the president’s net approval-disapproval rating in June, the percent change in GDP from Q1 to Q2 of 2012, and whether the incumbent party has held the presidency for two or more terms. On that basis, Drew forecast on June 23 that the outcome (in electoral college votes) would be Obama 332 votes, Romney 206. Over time, the model used Bayesian statistics to gradually incorporate real-time polling data, and used smoothing methods to account for the fact that many state polls were sporadic. Nonetheless, the forecast never changed much, and remained around 332:206 right up to election day:
The final election result? Obama 332, Romney 206.
Drew described his methodology in an interview with the LA Times:
On Nov. 6, I predicted that Obama would win 332 electoral votes, with 206 for Romney. But I also predicted the exact same outcome on June 23, and the prediction barely budged through election day.
How is this possible? Statistics. I did it by systematically combining information from long-term historical factors — economic growth, presidential popularity and incumbency status — with the results of state-level public opinion polls. The political and economic “fundamentals” of the race indicated at the outset that Obama was on track to win reelection. The polls never contradicted this, even after the drop in support for Obama following the first presidential debate. In fact, state-level voter preferences were remarkably stable this year; varying by no more than 2 or 3 percentage points over the entire campaign (as compared to the 5% to 10% swings in 2008).
The actual mechanics of my forecasts were performed using a statistical model that I developed and posted on my website, votamatic.org. While quantitative election forecasting is still an emerging area, many analysts were able to predict the result on the day of the election by aggregating the polls. The challenge remains to improve estimates of the outcome early in the race, and use this information to better understand what campaigns can accomplish and how voters make up their minds.
He also shared with me that he used R (and his polCA package for latent class analysis) to create the entire forecast:
Everything's done in R — data processing and graphics — and the model is fitted using WinBUGS. The website is just a WordPress blog that automatically pulls the R image files from a public dropbox.
(That's a neat trick for sharing graphics from R, by the way: write a script that writes R images to a local DropBox folder, and let DropBox take care of the web publishing by simply linking to the online DropBox file.)
You can find more details on the votamatic.org methodology at the link below. Also, if you're near San Francisco, Drew will be giving a talk to the Bay Area R User Group on February 12.
votamatic.org: How it Works
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.