R’s 2012 Growth in Capability Exceeds SAS’ All Time Total

[This article was first published on r4stats.com » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

by Robert A. Muenchen

I’m slowly gathering all the data needed to update my ongoing article, The Popularity of Data Analysis Software. The section below is the latest installment.

Growth in Capability

The capability of all the software in this article has grown significantly over the years. It would be helpful to be able to plot the growth of each software package’s capabilities, but such data is hard to obtain. John Fox (2009) acquired it for R’s main distribution site http://cran.r-project.org/. I collected the data for later versions following his method.

Figure 10 shows that the growth in R packages is following a rapid parabolic arc (quadratic fit with R-squared=.995). Early version numbers of R increase by 0.10 while more recent ones increased by 0.01. To make the x-axis consistent, the graph displays simply the numerical order in which the versions were released. The right-most point is for version 2.15.2, the last version released in 2012.

Figure 10. Number of R packages plotted for each major release of R. The last value on the x-axis represents version 2.15.2, the final release in 2012.

As rapid as this growth has been, the data in Figure 10 represents only the main CRAN repository. R does have eight other software repositories, such as the one at http://www.bioconductor.org/ that are not included in this graph. A program run on 3/19/2013 counted 6,275 R packages at all major repositories, 4,315 of which were at CRAN. So the growth curve for the software at all repositories would be roughly 30% higher on the y-axis than the one shown in Figure 10. As with any analysis software, individuals also maintain their own separate collections typically available on their web sites.

To put this astonishing growth in perspective, let us compare it to the most dominant commercial package, SAS. In its most recent version, 9.3, SAS offers 100 programming statements, 258 procedures (Base, STAT, ETS, Graph, HP Forecasting, Macro, OR, QC) and 520 SAS functions and call routines, and 314 IML statements, functions and subroutines for a total of 1,192 items that are roughly equivalent to R functions. R packages contain a median of 5 functions (Rasmus Bååth, 12/2012 personal communication). Therefore R has approximately 31,375 functions compared to SAS’ 1,192. In fact, during 2012 alone, R added more functions/procs than SAS Institute has provided in its entire history! That’s 701 packages, counting only CRAN, or around 3,505  new functions in 2012.

Of course these R functions and SAS procedures / functions are not perfectly equivalent. Some SAS procedures have many more options to control their output than R functions do, giving them potentially more output per command. However, R functions can nest inside one another, creating nearly infinite combinations of output. While the comparison is not perfect, it is certainly an eye opener.

Stay tuned for future updates which will include what employers are now advertising for and recent trends in academic use of analytic software.

To leave a comment for the author, please follow the link and comment on their blog: r4stats.com » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)