R activity around the world

April 22, 2014
By

(This article was first published on rapporter, and kindly contributed to R-bloggers)

This project was inspired by "Where is the R Activity?" and our follow-up post on the number of useR! 2013 attendees. But instead of static maps, now we gathered bunch of R-related data from a variety of different sources to create some interactive cartograms to highlight the focus of R activity from various points of view. Like the number of R Foundation members per country all over the world:

The number of ordinary members in the R Foundation
Figure 1. The number of ordinary members in the R Foundation (click for interactive map)

Please click on the above image or URL to see the interactive, D3.js driven map, where hovering the mouse over any country would reveal some detailed statistics on a number of R-related metrics. The menu with a few settings can be activated by hover the mouse over the small blue triangle on the top.

We have also fetched the number of other members (supporters, donors etc.) from the main R-project.org site, and computed the number of all R Foundation members per 1,000 persons, which shows a slightly modified plot -- due to the population-weighted scale:

The number of R Foundation members per 1,000 persons
Figure 2. The number of R Foundation members per 1,000 persons (click for interactive map)

And there are quite a few other metrics we collected from different data sources and merged in R:
  • the list attendees and participants of the annual useR! conferences were usually fetched from publicly available on the conference homepage (2004, 2006, 2008, 2010, 2011, 2013), in other cases (2009, 2012) the organizing committee kindly contributed the lists. 2007 is still missing.
  • the number of R User Groups and the number of members was fetched from meetup.com, although we are aware of the fact that only a subset of RUGs are hosted at that provider. This results in some degree of bias, and we would be extremely happy to get some help to fine-tune this database.
  • the number of CRAN package downloads in 2013 was fetched from the RStudio Cloud CRAN mirror, just like in the origin blog post. We decided to check the number of overall downloads and also for 5 packages. This latter extra work resulted in more options to render the cartogram, e.g. the Rcmdr downloads might be higher than devtools in some countries, where R is used in education, but not much R development takes place there.
  • online search queries were downloaded from Google Trends.
  • top R GitHub users were identified and fetched from its wonderful API. Unfortunately the search API limits the results to 1,000 elements, so this data should be rather considered as a sample for the most active R users on GitHub. The plots reflect the proportion of such users in each country.
  • and the number of visits at R-bloggers.com (on Figure 3 and 4) were kindly contributed by Tal Galili. Thank you, Tal!
Figure 3. The number of visits at R-bloggers.com (click for interactive map)
Figure 4. The number of visits at R-bloggers.com per 1,000 persons (click for interactive map)

The most time-consuming activity in data collection was to standardize the country names, and even more: to manually identify the country of each record if no location data was provided, which resulted in endless hours of desktop research. But our intern did a great job, although he probably knows the name of at least 2,000 different R users from all around the world by now :)

Feedback is highly welcomed, I would love to hear from the more than 3,000 useR! conference participants:

Figure 5. The number of useR! attendees per 1,000 persons in each country (click for interactive map)

Or from he more than 40,000 identified RUG members:

The number of R Meetup members
Figure 6. The number of R Meetup members (click for interactive map)

Or any others, who has contributed to the 31 and half million R package downloads in 2013:

Figure 7. The number of R package downloads (click for interactive map)

And congrats to Switzerland, the number one countRy by our artificial and arbitrary Rank, which was computed by averaging the population-weighted R-related variables mentioned above:

Figure 8. The global R index (click for interactive map)

To leave a comment for the author, please follow the link and comment on his blog: rapporter.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.