Analyzing R-bloggers

[This article was first published on The PolStat R Feed, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In the last two posts we saw how to download posts from R-bloggers, and then extract the title, author and date of each post and write that information to a csv file. Since we now have a nice data set from r-bloggers, we can start to examine the development of the site during its time span. In this post I will look at the following patterns in the data :

  1. The rate of monthly posts submitted to r-bloggers
  2. The distribution of posts and contributors
  3. The top contributors in total and tabulated by year

The graph below show the monthly count of posts submitted to

As you can see has experienced a tremendous growth in posts,. The first years, from 2005 to the end of 2008, where fairly consistent, with an average posting rate of 6 posts per month. In 2009 we see the beginning of a dramatic rise in submitted posts, which peaks in march 2011 with 266 posts that month. To see whether this is a function of a few very active bloggers, or if we also see a similar increase in contributors, the graph below plot the number of unique contributors for every month:

Here we see that the monthly number of contributors follows closely the monthly number of posts, therefor the rise in posts is not exclusively a result of a result of a few extremely active bloggers. However as the figure below show, most authors contribute a fairly small number of posts:

The distribution is extremely skewed with a median of 6 posts, and a few authors contributing 200 or more posts.

The overall top ten contributors to are:

author count
David Smith 647
xi’an 293
Thinking inside the box 217
Tal Galili 124
klr 104
Stephen Turner 102
dirk.eddelbuettel 94
Ralph 82
romain francois 79
C 77

Breaking this down by year we can see that from 2009 there is a rise of some very active R bloggers:

author count
Hadley Wickham 3
fernandohrosa 2
author count
seth 6
Hadley Wickham 5
dataninja 5
Di Cook 3
Vincent Zoonekynd&amp #039;s Blog 3
fernandohrosa 2
Andrew Gelman 1
author count
Mario Pineda-Krch 20
Forester 14
Egon Willighagen 5
Andrew Gelman 4
Rob J Hyndman 4
dataninja 4
Hadley Wickham 3
John Johnson 2
dan 2
seth 2
author count
Yu-Sung Su 28
Michal 9
Rob J Hyndman 8
Gregor Gorjanc 6
Forester 5
Di Cook 4
John Johnson 4
Mario Pineda-Krch 4
Radford Neal 4
abiao 4
author count
Thinking inside the box 63
dirk.eddelbuettel 36
Shige 30
John Myles White 28
Paolo 26
David Smith 25
Todos Logos 25
Jeromy Anglim 24
Stephen Turner 23
romain francois 23
author count
David Smith 352
xi’an 152
Thinking inside the box 85
C 75
Tal Galili 74
dirk.eddelbuettel 58
Ralph 53
romain francois 41
Stephen Turner 34
Kelly 33
author count
David Smith 268
xi’an 137
klr 104
Thinking inside the box 66
BMS Add-ons » BMS Blog 58
Pat 52
Scott Chamberlain 48
Stephen Turner 44
Kay Cichini 43
Tal Galili 37

From 2009 a number of authors appear in every year as some of the top contributors, and of course in 2010 David Smith and Xi’an appears, both with a massive output.

I see r-bloggers as one of the great services in the R community, and the presence of very knowledgeable and prolific contributors is a public good that we can all enjoy. So lets hope the current trend will continue into the new year!

As always the full r script to reproduce the above analysis is here:

To leave a comment for the author, please follow the link and comment on their blog: The PolStat R Feed. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)