Building a custom database of country time-series data using Quandl

May 8, 2013
By

(This article was first published on socialdatablog » R, and kindly contributed to R-bloggers)

Encouraged by this post I had another look at quandl for collecting datasets from different agencies. Right now I need to get data for four countries on a couple of dozen indicators.

Rplot

This graphic is just a quick example with only two indicators of what I am aiming to be able to do.

The process on Quandl at the moment is a bit fiddly:

  • there is no search function in the API
  • the country codes used are different from agency to agency

So my workflow is this. It isn’t as complicated as it sounds. I have used spreadsheets to store country codes and queries to make it all as re-useable as possible. You can download the spreadsheets here and here.

  • edit the csv spreadsheet of the 2-and 3-digit ISO country codes, plus the actual names. Also, WHO for some reasons uses some other codes which I had to paste in by hand. If you find your sources are also using yet other codes, you can add them to the spreadsheet. Put an x in the “enabled” column to mark the countries you want to use.
  • search manually at quandl for interesting queries and add them to the other csv spreadsheet, replacing the country code with %s, again putting an x in the “enabled” column for the queries you want, adding a human-readable title in the “title” column if you want and putting “alpha2″ or “alpha3″ etc in the country_sign column to mark which kind of country code is being used.
  • run the script below.

 

authcode=”yourAuthCodeFromQuandl”

 

library(Quandl)

 

cou=list()

queries=read.csv(“queries.csv”)

queries=queries[queries$enabled!="",]

codes=read.csv(“countryCodes.csv”)

codesE=codes[codes$enabled!="",]

for(qq in 1:nrow(queries)){

q=queries$query[qq]

for(cc in 1:nrow(codesE)){

 

co=codesE[cc,queries[qq,"country_sign"]]

tex=paste(q,co,sep=”.”)

cou[[tex]]=try(Quandl(sprintf(q,co),authcode=authcode),T)

if(attributes(cou[[tex]])$class!=”try-error”)cou[[tex]]$Indicator=ifelse(!is.na(queries$title[qq]),queries$title[qq],q)

if(attributes(cou[[tex]])$class!=”try-error”)cou[[tex]]$Country=codesE[cc,"name"]

}}

 

rr=rbind.fill(cou[sapply(cou,function(x)length(x)>1)])

rr$Date=as.character(rr$Date)rr$Year=as.character(rr$Year)

rr$Year=as.Date(ifelse(!is.na(rr$Year),rr$Year,rr$Date))

rr$Value=ifelse(!is.na(rr$Value),rr$Value,rr$Percent) #you might have to do something like this if your queries are returning data in columns with some other label than Value

#then try a graphic for demonstration purposes

ggplot(data=rr,aes(x=Year,y=Value,group=Country,colour=Country))+geom_point(size=3)+geom_line()+facet_grid(Indicator~.,scales=”free”)+ theme(strip.text.y = theme_text(size = 13, hjust=0,angle = 0))+theme(axis.text.x=element_text(angle=90))

And voila.

I wanted to put the spreadsheets as a google spreadsheet but it seems RGoogleDocs is not working for R 3.0.

To leave a comment for the author, please follow the link and comment on his blog: socialdatablog » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.