World Government Data Store API (R and Ruby)

July 10, 2010
By

(This article was first published on R-Chart, and kindly contributed to R-bloggers)


The UK Guardian Data Blog has great visualizations on the topics of the day - along with with specific references to data sets and online resources in use.  You can find out more about the origins and plans of this and related data sites in their first plot post.  It is just one of several Guardian data related sites.


The World Government Data Store contains well over 2,000 datasets indexed from government data stores and it is growing. Its associated API can provide search results in JSON and ATOM formats and is described on the Data Blog.  You can figure out how the (URL based) API works in a few minutes by following the following steps.

1)  Start by entering a search in a browser: http://www.guardian.co.uk/world-government-data.  For example, enter the word "schools".    The results page will contain a URL like the following:


2)  Replace /search? with either /search.json? or /search.atom?

    The URL will now return data in the appropriate format.  

3)  To make it easier to read these results add  &human=text to the end of the URL and the results will be indented and spaced in a human readable fashion.

4)  If multiple pages are available for a given query, indicate the page to process using &page=2 where 2 is the page number you intend to retrieve.

A complete example of an R function to retrieve titles and links using R is as follows.

ukGuardianData=function(q, page=1){
  library(rjson)
  u='http://www.guardian.co.uk/world-government-data/search.json?q='
  x=fromJSON(readLines(url(paste(u,q,'&page=',page,sep=''))))
  for (i in 1:length(x$results)) {
    link=''
    if (length(x$results[[i]]$download_links) > 0){
      link = x$results[[i]]$download_links[[1]]$link
    }
    print(paste(i, x$results[[i]]$title, link))
  }

}

The rjson library is used to parse the response.  For each item in x$results, the first associated download link is retreived (if available).  The index, title and link are then outputted.  To call the function, simply enter:

ukGuardianData('schools',2)

Switching over to ruby, you can retrieve the results in a similar manner.  In irb:

['rubygems','json','open-uri'].each{|r|require r}
h=JSON.parse(open(u).readlines.join)

The object returned is a hash - so we can start by checking what keys are available.

h.keys

=> ["results", "total_pages", "title", "order", "query", "total_results", "curre
nt_page", "facets"]

If it was not clear earlier, now you can see that the results are one component of what is returned, but there is additional metadata that allows you to work with the entire set of results (which are returned a page at a time).  So to see the titles of the page returned by our call:

0.upto(h['results'].size-1){|i|
  puts "#{i+1} #{h['results'][i]['title']}"
}

If you start using Guardian APIs to create a mashup or app, make sure to let the Guardian know about your app.  If you use the data and want to let others know, they describe some conventions that will provide visibility to your work on various social networks.  For example on Twitter, use the hashtags #openplatform and #datastore in your tweets. 

To leave a comment for the author, please follow the link and comment on his blog: R-Chart.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: ,

Comments are closed.