Site icon R-bloggers

Facts About R Packages (1)

[This article was first published on Category: R | Huidong Tian's Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R Packages growth Curve

Why R is so popular? There are a lot of reasons, such as: easy to learn and convenient to use, active community, open source, etc. Another important reason is the numerous contributed packages. Up to yesterday, there are 4033 R packages on CRAN. How is the growth curve of R packages in the pasted decade? How many packages were contributed to CRAN every month?

The following figure shows the growth curve of R package:

File c:/tianhd.me/source/gvis/RpkgCurve1.html could not be found < !--more-->

R is getting more and more popular which can be seen from the number of packages contributed every month:

File c:/tianhd.me/source/gvis/RpkgCurve2.html could not be found

The first contributed R package is called leaps: regression subset selection. Uploaded by Thomas Lumley.

Here is the R code for above result. The code generated more information behind the above, which will be used in the next blogs.

< notextile>
Download package information from CRAN
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
# Load packages needed;
library(XML)
library(googleVis)</p>

<h1 id="set-cran-depository">set CRAN depository;</h1>
<p>CRAN.mirr &lt;- “http://cran.r-project.org/”
CRAN.home &lt;- “web/packages/available_packages_by_name.html”</p>

<h1 id="read-in-packages-name-and-description">read in packages name and description;</h1>
<p>pkg &lt;- readHTMLTable(paste(CRAN.mirr, CRAN.home, sep = “”), skip = 1,)[[1]]
names(pkg) &lt;- c(“Name”, “Description”)
pkg &lt;- pkg[!is.na(pkg$Name),]
pkg[,1] &lt;- as.character(pkg[,1])
pkg[,2] &lt;- as.character(pkg[,2])</p>

<h1 id="define-a-function-to-convert-date-format-11-jun-2011-to-2011-06-11">Define a function to convert date format “11-Jun-2011” to “2011-06-11”;</h1>
<p>as.posix &lt;- function(x) {
  day &lt;- substr(x, 1, 2)
  mth &lt;- substr(x, 4, 6)
  yr  &lt;- substr(x, 8, 11)
  Mth &lt;- c(“Jan”, “Feb”, “Mar”, “Apr”, “May”, “Jun”, “Jul”, “Aug”, “Sep”, “Oct”, “Nov”, “Dec”)
  mth &lt;- unlist(sapply(mth, FUN = function(x) {
    m &lt;- which(Mth == x)
    if (nchar(m) == 1) m &lt;- paste(“0”, m, sep = “”)
    return(m)}))
  paste(yr, mth, day, sep = “-“)
}</p>

<h1 id="create-a-list-to-contain-detail-information-of-each-package">Create a list to contain detail information of each package;</h1>
<p># This process will take about 15 minutes;
PKG &lt;- list()
pb &lt;- txtProgressBar(min = 0, max = nrow(pkg), style = 3)
for (i in 1:nrow(pkg)) {
  pkg.nam &lt;- pkg$Name[i]
  pkg.url &lt;- paste(CRAN.mirr, “web/packages/”, pkg.nam, “/index.html”, sep = “”)
  pkg.des &lt;- readHTMLTable(pkg.url)
  names(pkg.des) &lt;- c(“Description”, “Downloads”, “Dependency”)[1:length(pkg.des)]
  if (“Old sources:” %in% pkg.des$Downloads$V1) {
    hist.url &lt;- paste(CRAN.mirr, “src/contrib/Archive/”, pkg.nam, sep = “”)
    hist.dat &lt;- readHTMLTable(hist.url, skip = 2)[[1]][, 2:3]
    names(hist.dat) &lt;- c(“Name”, “Date”)
    hist.dat &lt;- hist.dat[!is.na(hist.dat$Name),]
    hist.dat$Date &lt;- as.posix(hist.dat$Date)
    pkg.des[[“History”]] &lt;- hist.dat
  }
  for (l in 1:length(pkg.des)) {
    pkg.des[[l]][,1] &lt;- as.character(pkg.des[[l]][,1])
    pkg.des[[l]][,2] &lt;- as.character(pkg.des[[l]][,2])
  }
  PKG[[pkg.nam]] &lt;- pkg.des
  setTxtProgressBar(pb, i)
}
close(pb)</p>

<h1 id="extract-the-date-of-the-first-version-of-each-package">Extract the date of the first version of each package;</h1>
<p>pkg.trend &lt;- data.frame(pkg.name = names(PKG))
for (i in 1:nrow(pkg.trend)) {
  pkg &lt;- pkg.trend$pkg.name[i]
  pkg.des &lt;- PKG[[pkg]]
  if (“History” %in% names(pkg.des)) {
    pkg.trend$Date.1[i] &lt;- as.character(min(pkg.des$History$Date))
  }else {
    pkg.trend$Date.1[i] &lt;-
    pkg.des$Description$V2[which(pkg.des$Description$V1 == “Published:”)]
  }
}</p>

<h1 id="aggregates-the-package-number-for-each-month">aggregates the package number for each month;</h1>
<p>pkg.trend$Date.2 &lt;- paste(substr(pkg.trend$Date.1, 1, 7), “01”, sep = “-“)
pkg.trend$Date.2 &lt;- as.POSIXct(pkg.trend$Date.2, format = “%Y-%m-%d”)
pkg.dat &lt;- with(pkg.trend, aggregate(list(Num = Date.2), list(Date = Date.2), length))
pkg.dat$Num1 &lt;- cumsum(pkg.dat$Num)</p>

<h1 id="display-growth-curve-using-googlevis">Display growth curve using GoogleVis;</h1>
<p>Line1 &lt;- gvisLineChart(pkg.dat, xvar=”Date”, yvar=”Num1”)
Line2 &lt;- gvisLineChart(pkg.dat, xvar=”Date”, yvar=”Num”)
plot(Line1)
plot(Line2)</p>

<p>

To leave a comment for the author, please follow the link and comment on their blog: Category: R | Huidong Tian's Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.