Revisiting the GOP Race with the Huff Post API and pollstR

[This article was first published on PremierSoccerStats » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Well, one election is over but it is never too soon to start another – or in this case revisit the past four years

One day after the 2008 US Presidential election, there was a Rasmussen poll taken of 1000 likely voters asking for their choice for the 2012 Republican Presedential Candidate.
The overwhelming favourite was Sarah Palin, who garnered 64% of the preferencees with Huckabee(12) and Romney(11) the only others to reach double digits. And thus started arguably the most topsy-turvy race in election history – ending in ultimate defeat.

Guys at the Huffington Post have kindly produced an API for stacks of opinion polls and Drew Linzer has produced an R function, pollstR, on github to interact with it

The first step is to determine which HP poll the data is in

?View Code RSPLUS
 
library(XML)
library(ggplot2)
library(plyr)
 
url <-"http://elections.huffingtonpost.com/pollster/api/charts"
raw.data <- readLines(url, warn="F") 
rd  <- fromJSON(raw.data)
pollName <- c()
for (i in 1:length(rd)) {
  pollName <- append(pollName,rd[i][[1]]$slug)
print(pollName)
}

This provides a list of 345 polls and a quick perusal shows that the required one is named “2012-national-gop-primary” so this can be plugged into the aforementioned function, once it has been sourced, and an analysis of the resulting data performed

?View Code RSPLUS
# extract data to a data.frame
polls <- pollstR(chart="2012-national-gop-primary",pages="all")
# look at the structure
colnames(polls) # 43 columns most of them names of candidates
#[1] "id"         "pollster"   "start.date" "end.date"   "method"     "subpop"     "N"          "Romney"     "Gingrich" ...
# the data needs to be reshaped - for my purpose I just need the end.date and candidates data
polls <- polls[,c(4,8:43)]
polls.melt <- melt(polls,id="end.date")
# set meaningful columns
colnames(polls.melt) <- c("pollDate","candidate","pc")
 
# get a list of candidates that have polled 10% or more at least once
contenders <- ddply(polls.melt,.(candidate),summarize,max=max(pc,na.rm=TRUE) )
contenders <- subset(contenders,max>9)$candidate
 
# eliminate results for undecideds etc.
contenders <- contenders[c(-4,-5,-7,-11,-18)]
 
# I want to plot the each poll leader and have their name show on the max value for when they led
polls.melt <- arrange(polls.melt,desc(pc))
polls.melt <- ddply(polls.melt,).(pollDate), transform, order=1:nrow(piece))
leaders <- subset(polls.melt,candidate %in% contenders&order==1)
# romney has two pc of 57% so need to hack for a clear graph
leaders[96,3] <- 56
# create highest poll (when leading) for each candidate
leaders$best <- "N"
for (i in 1:nrow(leaders)) {
if (leaders$pc[i]==leaders$max[i]) {
  leaders$best[i]<-"Y"
}
}
# now produce graph
q <- ggplot(leaders,aes(as.POSIXct(pollDate),pc))+geom_point(aes(colour=candidate))
q <- q+geom_text(aes(label=candidate,colour=candidate,vjust=-1),size=3,data=leaders[leaders$best=="Y",])
q  <- q+  ggtitle("Leader of GOP polls and Maximum value by Candidate")+ylab("%")+xlab("")+theme_bw()
q


For the first couple of years, Palin, Huckabee and Romney continued to dominate but when the race commenced for real an amazing eleven participants – even Donald Trump – ended up topping a poll on at least one occasion

It is worthwhile looking at individual candidate’s performance over the final 18 months

?View Code RSPLUS
 p <- ggplot(subset(polls.melt,candidate %in% contenders&pollDate>"2010-12-31"),aes(pollDate,pc)) 
 
 p <- p+ geom_smooth(se=FALSE) +facet_wrap(~candidate) +scale_x_date(breaks = date_breaks("years"),labels = date_format("%Y"))
p <- p +  ggtitle("Smoothed results of National Polls - GOP Race")+ylab("%")+xlab("")+theme_bw()
p <- p+ theme(strip.text.x = element_text(colour="White", face="bold"),
           strip.background = element_rect( fill="#CB3128"))
p

Once Palin and Huckabee had proved uninspiring, the field narrowed to the cultish Ron Paul, the ‘meh’ candidate, Romney, and a host of short-lived shooting stars

To leave a comment for the author, please follow the link and comment on their blog: PremierSoccerStats » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)