Building a custom database of country time-series data using Quandl

[This article was first published on socialdatablog » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Encouraged by this post I had another look at quandl for collecting datasets from different agencies. Right now I need to get data for four countries on a couple of dozen indicators.

Rplot

This graphic is just a quick example with only two indicators of what I am aiming to be able to do.

The process on Quandl at the moment is a bit fiddly:

  • there is no search function in the API
  • the country codes used are different from agency to agency

So my workflow is this. It isn’t as complicated as it sounds. I have used spreadsheets to store country codes and queries to make it all as re-useable as possible. You can download the spreadsheets here and here.

  • edit the csv spreadsheet of the 2-and 3-digit ISO country codes, plus the actual names. Also, WHO for some reasons uses some other codes which I had to paste in by hand. If you find your sources are also using yet other codes, you can add them to the spreadsheet. Put an x in the “enabled” column to mark the countries you want to use.
  • search manually at quandl for interesting queries and add them to the other csv spreadsheet, replacing the country code with %s, again putting an x in the “enabled” column for the queries you want, adding a human-readable title in the “title” column if you want and putting “alpha2″ or “alpha3″ etc in the country_sign column to mark which kind of country code is being used.
  • run the script below.

 

authcode=”yourAuthCodeFromQuandl”

 

library(Quandl)

 

cou=list()

queries=read.csv(“queries.csv”)

queries=queries[queries$enabled!=””,]

codes=read.csv(“countryCodes.csv”)

codesE=codes[codes$enabled!=””,]

for(qq in 1:nrow(queries)){

q=queries$query[qq]

for(cc in 1:nrow(codesE)){

 

co=codesE[cc,queries[qq,”country_sign”]]

tex=paste(q,co,sep=”.”)

cou[[tex]]=try(Quandl(sprintf(q,co),authcode=authcode),T)

if(attributes(cou[[tex]])$class!=”try-error”)cou[[tex]]$Indicator=ifelse(!is.na(queries$title[qq]),queries$title[qq],q)

if(attributes(cou[[tex]])$class!=”try-error”)cou[[tex]]$Country=codesE[cc,”name”]

}}

 

rr=rbind.fill(cou[sapply(cou,function(x)length(x)>1)])

rr$Date=as.character(rr$Date)rr$Year=as.character(rr$Year)

rr$Year=as.Date(ifelse(!is.na(rr$Year),rr$Year,rr$Date))

rr$Value=ifelse(!is.na(rr$Value),rr$Value,rr$Percent) #you might have to do something like this if your queries are returning data in columns with some other label than Value

#then try a graphic for demonstration purposes

ggplot(data=rr,aes(x=Year,y=Value,group=Country,colour=Country))+geom_point(size=3)+geom_line()+facet_grid(Indicator~.,scales=”free”)+ theme(strip.text.y = theme_text(size = 13, hjust=0,angle = 0))+theme(axis.text.x=element_text(angle=90))

And voila.

I wanted to put the spreadsheets as a google spreadsheet but it seems RGoogleDocs is not working for R 3.0.

To leave a comment for the author, please follow the link and comment on their blog: socialdatablog » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)