Facts About R Packages (1)

August 29, 2012
By

(This article was first published on Category: R | Huidong Tian's Blog, and kindly contributed to R-bloggers)

R Packages growth Curve

Why R is so popular? There are a lot of reasons, such as: easy to learn and convenient to use, active community, open source, etc. Another important reason is the numerous contributed packages. Up to yesterday, there are 4033 R packages on CRAN. How is the growth curve of R packages in the pasted decade? How many packages were contributed to CRAN every month?

The following figure shows the growth curve of R package:

File c:/tianhd.me/source/gvis/RpkgCurve1.html could not be found

R is getting more and more popular which can be seen from the number of packages contributed every month:

File c:/tianhd.me/source/gvis/RpkgCurve2.html could not be found

The first contributed R package is called leaps: regression subset selection. Uploaded by Thomas Lumley.

Here is the R code for above result. The code generated more information behind the above, which will be used in the next blogs.

Download package information from CRAN
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
# Load packages needed;
library(XML)
library(googleVis)</p>

<h1 id="set-cran-depository">set CRAN depository;</h1>
<p>CRAN.mirr &lt;- “http://cran.r-project.org/CRAN.home &lt;- “web/packages/available_packages_by_name.html”</p>

<h1 id="read-in-packages-name-and-description">read in packages name and description;</h1>
<p>pkg &lt;- readHTMLTable(paste(CRAN.mirr, CRAN.home, sep = “”), skip = 1,)[[1]]
names(pkg) &lt;- c(“Name”, “Description”)
pkg &lt;- pkg[!is.na(pkg$Name),]
pkg[,1] &lt;- as.character(pkg[,1])
pkg[,2] &lt;- as.character(pkg[,2])</p>

<h1 id="define-a-function-to-convert-date-format-11-jun-2011-to-2011-06-11">Define a function to convert date format “11-Jun-2011” to “2011-06-11;</h1>
<p>as.posix &lt;- function(x) {
  day &lt;- substr(x, 1, 2)
  mth &lt;- substr(x, 4, 6)
  yr  &lt;- substr(x, 8, 11)
  Mth &lt;- c(“Jan”, “Feb”, “Mar”, “Apr”, “May”, “Jun”, “Jul”, “Aug”, “Sep”, “Oct”, “Nov”, “Dec”)
  mth &lt;- unlist(sapply(mth, FUN = function(x) {
    m &lt;- which(Mth == x)
    if (nchar(m) == 1) m &lt;- paste(0, m, sep = “”)
    return(m)}))
  paste(yr, mth, day, sep =-)
}</p>

<h1 id="create-a-list-to-contain-detail-information-of-each-package">Create a list to contain detail information of each package;</h1>
<p># This process will take about 15 minutes;
PKG &lt;- list()
pb &lt;- txtProgressBar(min = 0, max = nrow(pkg), style = 3)
for (i in 1:nrow(pkg)) {
  pkg.nam &lt;- pkg$Name[i]
  pkg.url &lt;- paste(CRAN.mirr, “web/packages/, pkg.nam,/index.html”, sep = “”)
  pkg.des &lt;- readHTMLTable(pkg.url)
  names(pkg.des) &lt;- c(“Description”, “Downloads”, “Dependency”)[1:length(pkg.des)]
  if (“Old sources:” %in% pkg.des$Downloads$V1) {
    hist.url &lt;- paste(CRAN.mirr, “src/contrib/Archive/, pkg.nam, sep = “”)
    hist.dat &lt;- readHTMLTable(hist.url, skip = 2)[[1]][, 2:3]
    names(hist.dat) &lt;- c(“Name”, “Date”)
    hist.dat &lt;- hist.dat[!is.na(hist.dat$Name),]
    hist.dat$Date &lt;- as.posix(hist.dat$Date)
    pkg.des[[“History”]] &lt;- hist.dat
  }
  for (l in 1:length(pkg.des)) {
    pkg.des[[l]][,1] &lt;- as.character(pkg.des[[l]][,1])
    pkg.des[[l]][,2] &lt;- as.character(pkg.des[[l]][,2])
  }
  PKG[[pkg.nam]] &lt;- pkg.des
  setTxtProgressBar(pb, i)
}
close(pb)</p>

<h1 id="extract-the-date-of-the-first-version-of-each-package">Extract the date of the first version of each package;</h1>
<p>pkg.trend &lt;- data.frame(pkg.name = names(PKG))
for (i in 1:nrow(pkg.trend)) {
  pkg &lt;- pkg.trend$pkg.name[i]
  pkg.des &lt;- PKG[[pkg]]
  if (“History” %in% names(pkg.des)) {
    pkg.trend$Date.1[i] &lt;- as.character(min(pkg.des$History$Date))
  }else {
    pkg.trend$Date.1[i] &lt;-
    pkg.des$Description$V2[which(pkg.des$Description$V1 == “Published:”)]
  }
}</p>

<h1 id="aggregates-the-package-number-for-each-month">aggregates the package number for each month;</h1>
<p>pkg.trend$Date.2 &lt;- paste(substr(pkg.trend$Date.1, 1, 7),01, sep =-)
pkg.trend$Date.2 &lt;- as.POSIXct(pkg.trend$Date.2, format =%Y-%m-%d”)
pkg.dat &lt;- with(pkg.trend, aggregate(list(Num = Date.2), list(Date = Date.2), length))
pkg.dat$Num1 &lt;- cumsum(pkg.dat$Num)</p>

<h1 id="display-growth-curve-using-googlevis">Display growth curve using GoogleVis;</h1>
<p>Line1 &lt;- gvisLineChart(pkg.dat, xvar=”Date”, yvar=”Num1”)
Line2 &lt;- gvisLineChart(pkg.dat, xvar=”Date”, yvar=”Num”)
plot(Line1)
plot(Line2)</p>

<p>

To leave a comment for the author, please follow the link and comment on his blog: Category: R | Huidong Tian's Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.