Accessing and plotting World Bank data with R

September 25, 2011
By

(This article was first published on mages' blog, and kindly contributed to R-bloggers)

Over the past couple of days I played around with the data sets of the World Bank, and I have to admit that I am blown away by it. It is amazing, to see what is available on their web site. It is worth visiting their Data Visualisation Tools page. It is fantastic that they provide an API to their data. They have used it to build an iPhone App which is pretty cool. You can have the world's data in your pocket.

In this post I will show you how we can access data from the World Bank in R. As an example we create a motion chart, in the Hans Rosling style, as you find it on the Google Public Data Explorer site, which also uses data from the World Bank. Doing this, should give us the confidence that we understand the World Bank's interface. You can find this example as demo WorldBank as part of the googleVis package from version 0.2.10 onwards.

So let's try to replicate the initial plot of the Google Public Data Explorer, which shows fertility rate against life expectancy for each country from 1960 to today, whereby the countries are represented as bubbles with the size reflecting the population and the colour the region.

Duncan Temple Lang provides us with examples for accessing the World Bank's data using his RJSONIO and RCurl packages. The World Bank data is available via the API either as XML or JSON. We will use JSON as it straightforward to read the JSON data set into R and to transform it into a data frame with the fromJSON function of the RJSONIO package. In order to query the data base we have know which indicator variable we want and what its key is. Thankfully the World Bank provides us with a page which lists all indicator variables. Clicking on any of those reveals the indicator key in the URL. For our example we get the following mappings:

Indicator Key
fertility rate SP.DYN.TFRT.IN
life expectancy SP.DYN.LE00.IN
population SP.POP.TOTL
GDP per capita (current US$) NY.GDP.PCAP.CD

That's about it. From Duncan we have learned how to create the URL string to query the data base, and how to transform the query result from JSON into a data frame. The rest is re-arranging the data and combining the various data sets to get the final table. We display it via a motion chart using the gvisMotionChart function of the googleVis package. You find the detailed R code below.



## This demo shows how country level data can be accessed from
## the World Bank via their API and displayed with a Motion Chart.
## Inspired by Google's Public Data Explorer, see
## http://www.google.com/publicdata/home
##
## For the World Bank Data terms of use see:
## http://data.worldbank.org/summary-terms-of-use
##
## To run this demo an internet connection and Flash are required.
## This demo is part of the googleVis R package.
 
 
getWorldBankData <- function(id='SP.POP.TOTL', date='1960:2010',
value="value", per.page=12000){
require(RJSONIO)
url <- paste("http://api.worldbank.org/countries/all/indicators/", id,
"?date=", date, "&format=json&per_page=", per.page,
sep="")
 
wbData <- fromJSON(url)[[2]]
 
wbData = data.frame(
year = as.numeric(sapply(wbData, "[[", "date")),
value = as.numeric(sapply(wbData, function(x)
ifelse(is.null(x[["value"]]),NA, x[["value"]]))),
country.name = sapply(wbData, function(x) x[["country"]]['value']),
country.id = sapply(wbData, function(x) x[["country"]]['id'])
)
 
names(wbData)[2] <- value
 
return(wbData)
}
 
getWorldBankCountries <- function(){
require(RJSONIO)
wbCountries <-
fromJSON("http://api.worldbank.org/countries?per_page=12000&format=json")
wbCountries <- data.frame(t(sapply(wbCountries[[2]], unlist)))
wbCountries$longitude <- as.numeric(wbCountries$longitude)
wbCountries$latitude <- as.numeric(wbCountries$latitude)
levels(wbCountries$region.value) <- gsub("\\(all income levels\\)",
"", levels(wbCountries$region.value))
return(wbCountries)
}
 
## Create a string 1960:this year, e.g. 1960:2011
years <- paste("1960:", format(Sys.Date(), "%Y"), sep="")
 
## Fertility rate
fertility.rate <- getWorldBankData(id='SP.DYN.TFRT.IN',
date=years, value="fertility.rate")
 
## Life Expectancy
life.exp <- getWorldBankData(id='SP.DYN.LE00.IN', date=years,
value="life.expectancy")
 
## Population
population <- getWorldBankData(id='SP.POP.TOTL', date=years,
value="population")
 
## GDP per capita (current US$)
GDP.per.capita <- getWorldBankData(id='NY.GDP.PCAP.CD',
date=years,
value="GDP.per.capita.Current.USD")
 
## Merge data sets
wbData <- merge(life.exp, fertility.rate)
wbData <- merge(wbData, population)
wbData <- merge(wbData, GDP.per.capita)
 
## Get country mappings
wbCountries <- getWorldBankCountries()
 
## Add regional information
wbData <- merge(wbData, wbCountries[c("iso2Code", "region.value",
"incomeLevel.value")],
by.x="country.id", by.y="iso2Code")
 
## Filter out the aggregates and country id column
subData <- subset(wbData, !region.value %in% "Aggregates" , select=
-country.id)
 
## Create a motion chart
require(googleVis)

M <- gvisMotionChart(subData, idvar="country.name", timevar="year",
options=list(width=700, height=600))
 
## Display the chart in your browser

plot
(M)
Created by Pretty R at inside-R.org

To leave a comment for the author, please follow the link and comment on his blog: mages' blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , , , , ,

Comments are closed.