Revisiting the GOP Race with the Huff Post API and pollstR

November 7, 2012
By

(This article was first published on PremierSoccerStats » R, and kindly contributed to R-bloggers)

Well, one election is over but it is never too soon to start another – or in this case revisit the past four years

One day after the 2008 US Presidential election, there was a Rasmussen poll taken of 1000 likely voters asking for their choice for the 2012 Republican Presedential Candidate.
The overwhelming favourite was Sarah Palin, who garnered 64% of the preferencees with Huckabee(12) and Romney(11) the only others to reach double digits. And thus started arguably the most topsy-turvy race in election history – ending in ultimate defeat.

Guys at the Huffington Post have kindly produced an API for stacks of opinion polls and Drew Linzer has produced an R function, pollstR, on github to interact with it

The first step is to determine which HP poll the data is in

?View Code RSPLUS
 
library(XML)
library(ggplot2)
library(plyr)
 
url <-"http://elections.huffingtonpost.com/pollster/api/charts"
raw.data <- readLines(url, warn="F") 
rd  <- fromJSON(raw.data)
pollName <- c()
for (i in 1:length(rd)) {
  pollName <- append(pollName,rd[i][[1]]$slug)
print(pollName)
}

This provides a list of 345 polls and a quick perusal shows that the required one is named “2012-national-gop-primary” so this can be plugged into the aforementioned function, once it has been sourced, and an analysis of the resulting data performed

?View Code RSPLUS
# extract data to a data.frame
polls <- pollstR(chart="2012-national-gop-primary",pages="all")
# look at the structure
colnames(polls) # 43 columns most of them names of candidates
#[1] "id"         "pollster"   "start.date" "end.date"   "method"     "subpop"     "N"          "Romney"     "Gingrich" ...
# the data needs to be reshaped - for my purpose I just need the end.date and candidates data
polls <- polls[,c(4,8:43)]
polls.melt <- melt(polls,id="end.date")
# set meaningful columns
colnames(polls.melt) <- c("pollDate","candidate","pc")
 
# get a list of candidates that have polled 10% or more at least once
contenders <- ddply(polls.melt,.(candidate),summarize,max=max(pc,na.rm=TRUE) )
contenders <- subset(contenders,max>9)$candidate
 
# eliminate results for undecideds etc.
contenders <- contenders[c(-4,-5,-7,-11,-18)]
 
# I want to plot the each poll leader and have their name show on the max value for when they led
polls.melt <- arrange(polls.melt,desc(pc))
polls.melt <- ddply(polls.melt,).(pollDate), transform, order=1:nrow(piece))
leaders <- subset(polls.melt,candidate %in% contenders&order==1)
# romney has two pc of 57% so need to hack for a clear graph
leaders[96,3] <- 56
# create highest poll (when leading) for each candidate
leaders$best <- "N"
for (i in 1:nrow(leaders)) {
if (leaders$pc[i]==leaders$max[i]) {
  leaders$best[i]<-"Y"
}
}
# now produce graph
q <- ggplot(leaders,aes(as.POSIXct(pollDate),pc))+geom_point(aes(colour=candidate))
q <- q+geom_text(aes(label=candidate,colour=candidate,vjust=-1),size=3,data=leaders[leaders$best=="Y",])
q  <- q+  ggtitle("Leader of GOP polls and Maximum value by Candidate")+ylab("%")+xlab("")+theme_bw()
q


For the first couple of years, Palin, Huckabee and Romney continued to dominate but when the race commenced for real an amazing eleven participants – even Donald Trump – ended up topping a poll on at least one occasion

It is worthwhile looking at individual candidate’s performance over the final 18 months

?View Code RSPLUS
 p <- ggplot(subset(polls.melt,candidate %in% contenders&pollDate>"2010-12-31"),aes(pollDate,pc)) 
 
 p <- p+ geom_smooth(se=FALSE) +facet_wrap(~candidate) +scale_x_date(breaks = date_breaks("years"),labels = date_format("%Y"))
p <- p +  ggtitle("Smoothed results of National Polls - GOP Race")+ylab("%")+xlab("")+theme_bw()
p <- p+ theme(strip.text.x = element_text(colour="White", face="bold"),
           strip.background = element_rect( fill="#CB3128"))
p

Once Palin and Huckabee had proved uninspiring, the field narrowed to the cultish Ron Paul, the ‘meh’ candidate, Romney, and a host of short-lived shooting stars

To leave a comment for the author, please follow the link and comment on his blog: PremierSoccerStats » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: ,

Comments are closed.