Gathering RealClearPolitics Polling Trends with XML

November 19, 2012
By

(This article was first published on is.R(), and kindly contributed to R-bloggers)

Now that the election is over, you may want to use polling data in a model of the campaign. Simon Jackman has thoughtfully made his daily state-by-state predictions available for download, but a commonly-used dataset is the RealClearPolitics polling average.

As you can see when you go to RCP, they have a nice HTML5 graph (screenshot above), over which you can hover with your mouse to reveal daily point estimates. Unfortunately, the numbers that compose those point estimates are a little tricky to tease out — at least, it was tricky for me. Fortunately, I managed to wrangle out the Romney vs. Obama daily averages, which you can download here [CSV].

Fortunately, RCP uses stores their time series data in XML, meaning that the method I used to get those Romney vs. Obama numbers can be used to collect any RCP data, such as from this comparison of Obama & Bush Job Approval. Just view source, and [CTRL-F] for “xml,” and try to identify the XML file from which the graph is drawing data:

In this case, the file appears to be o_vs_b6.xml, which we can find listed in this directory of all RCP XML files and graph-drawing code.

From there, you can just use the R package XML and the following code as a guide for neatly folding the XML data into a data.frame. It will take a little effort on your part (i.e. it’s not just “CTRL-A, CTRL-R”), but the XML should be consistently-formatted, and thus not too difficult to parse.

To leave a comment for the author, please follow the link and comment on his blog: is.R().

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , ,

Comments are closed.