Analyzing R-bloggers

January 6, 2012
By

(This article was first published on The PolStat R Feed, and kindly contributed to R-bloggers)

In the last two posts we saw how to download posts from R-bloggers, and then extract the title, author and date of each post and write that information to a csv file. Since we now have a nice data set from r-bloggers, we can start to examine the development of the site during its time span. In this post I will look at the following patterns in the data :

  1. The rate of monthly posts submitted to r-bloggers
  2. The distribution of posts and contributors
  3. The top contributors in total and tabulated by year

The graph below show the monthly count of posts submitted to r-bloggers.com:

As you can see R-bloggers.com has experienced a tremendous growth in posts,. The first years, from 2005 to the end of 2008, where fairly consistent, with an average posting rate of 6 posts per month. In 2009 we see the beginning of a dramatic rise in submitted posts, which peaks in march 2011 with 266 posts that month. To see whether this is a function of a few very active bloggers, or if we also see a similar increase in contributors, the graph below plot the number of unique contributors for every month:

Here we see that the monthly number of contributors follows closely the monthly number of posts, therefor the rise in posts is not exclusively a result of a result of a few extremely active bloggers. However as the figure below show, most authors contribute a fairly small number of posts:

The distribution is extremely skewed with a median of 6 posts, and a few authors contributing 200 or more posts.

The overall top ten contributors to r-bloggers.org are:

author count
David Smith 647
xi'an 293
Thinking inside the box 217
Tal Galili 124
klr 104
Stephen Turner 102
dirk.eddelbuettel 94
Ralph 82
romain francois 79
C 77

Breaking this down by year we can see that from 2009 there is a rise of some very active R bloggers:

2005
author count
Hadley Wickham 3
fernandohrosa 2
2006
author count
seth 6
Hadley Wickham 5
dataninja 5
Di Cook 3
Vincent Zoonekynd& #039;s Blog 3
fernandohrosa 2
Andrew Gelman 1
2007
author count
Mario Pineda-Krch 20
Forester 14
Egon Willighagen 5
Andrew Gelman 4
Rob J Hyndman 4
dataninja 4
Hadley Wickham 3
John Johnson 2
dan 2
seth 2
2008
author count
Yu-Sung Su 28
Michal 9
Rob J Hyndman 8
Gregor Gorjanc 6
Forester 5
Di Cook 4
John Johnson 4
Mario Pineda-Krch 4
Radford Neal 4
abiao 4
2009
author count
Thinking inside the box 63
dirk.eddelbuettel 36
Shige 30
John Myles White 28
Paolo 26
David Smith 25
Todos Logos 25
Jeromy Anglim 24
Stephen Turner 23
romain francois 23
2010
author count
David Smith 352
xi'an 152
Thinking inside the box 85
C 75
Tal Galili 74
dirk.eddelbuettel 58
Ralph 53
romain francois 41
Stephen Turner 34
Kelly 33
2011
author count
David Smith 268
xi'an 137
klr 104
Thinking inside the box 66
BMS Add-ons » BMS Blog 58
Pat 52
Scott Chamberlain 48
Stephen Turner 44
Kay Cichini 43
Tal Galili 37

From 2009 a number of authors appear in every year as some of the top contributors, and of course in 2010 David Smith and Xi’an appears, both with a massive output.

I see r-bloggers as one of the great services in the R community, and the presence of very knowledgeable and prolific contributors is a public good that we can all enjoy. So lets hope the current trend will continue into the new year!

As always the full r script to reproduce the above analysis is here:

To leave a comment for the author, please follow the link and comment on his blog: The PolStat R Feed.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.