Where is the R Activity?

June 10, 2013
By

(This article was first published on Spatial.ly » R, and kindly contributed to R-bloggers)

r_activity

R has become one of the world’s most widely used statistics and visualisation software packages with an ever growing user community. Thanks to the release of log files containing all hits to http://cran.rstudio.com/ server it is possible to make a map showing the parts of the world with the most active R users (specifically those mostly using the RStudio interface). The USA comes top with 3,045,960 requests to the server between October 2012 and June 2013. Japan is in 2nd place with a mere 756,177 requests and Germany 3rd. In all 203 countries appear in the server logs. I have scaled the map according to the number of server requests made and you can clearly see the dominance of Japan, Europe and North America compared with other parts of the world, especially Africa. The map of course isn’t a perfect representation of the number of R users, as you could have one or two people making hundreds of server requests a day versus a large number of people only making a couple. This is why I have entitled the map “Activity” rather than “Users”.  Either way R hasn’t quite achieved global domination but it is getting there…

To create the map I obtained the files following the instructions on the logs download page. I then combined them with the following code (take from here):
setwd("XXX") #this needs to be the directory with the downloaded files in it.
file_list <- list.files()

for (file in file_list){

# if the merged dataset doesn't exist, create it
if (!exists("dataset")){
dataset <- read.csv(file, header=TRUE)
}

# if the merged dataset does exist, append to it
if (exists("dataset")){
temp_dataset <- read.csv(file, header=TRUE)
dataset<-rbind(dataset, temp_dataset)
rm(temp_dataset)
}
print(file)
}

It is then possible to aggregate the data to get the number of requests per country.

dataset$flag<- 1
counts<- aggregate(dataset$flag, by=list(dataset$country), sum)
names(counts)<- c("country", "count")

The next step was to download a world shapefile (containing the country borders) from Natural Earth. This contains the country codes used in the log file (the dataset object above). We can open this file with the maptools package:

library(maptools)
world<-readShapePoly("yourworldshapefile")

It is then possible to join our counts object to the world object to assign the log counts to each country based on the "iso_a2" and "country" fields respectively. The new shapefile is also saved.

world@data = data.frame(world@data, counts[match(world@data[,"iso_a2"], counts[,"country"]),])
writePolyShape(world, "world_r_use.shp")

This next bit is a bit of a cheat as I used the ScapeToad software to create the cartogram. A package exists to do this in R but I find ScapeToad to be more powerful. You can download the shapefile I produced from here. I have then reloaded the new shapefile into R and used the basic plot functions to produce the map.

cartogram<-readShapePoly("world_r_carto.shp")

plot(cartogram)
title(main="R Activity Around the World", sub="Based on cran.rstudio.com Activity Logs October 2012-June 2013")

This is my first stab at looking at the data - there is a lot more that can be done with it!

To leave a comment for the author, please follow the link and comment on his blog: Spatial.ly » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.