Finally! Tracking CRAN packages downloads

June 11, 2013

(This article was first published on Nicebread » R, and kindly contributed to R-bloggers)

[Update June 12: Data.tables functions have been improved (thanks to a comment by Matthew Dowle); for a similar approach see also Tal Galili’s post]

The guys from RStudio now provide CRAN download logs (see also this blog post). Great work!

I always asked myself, how many people actually download my packages. Now I finally can get an answer (… with some anxiety to get frustrated ;-)
Here are the complete, self-contained R scripts to analyze these log data:

Step 1: Download all log files in a subfolder (this steps takes a couple of minutes)

?View Code RSPLUS
## ======================================================================
## Step 1: Download all log files
## ======================================================================
# Here's an easy way to get all the URLs in R
start <- as.Date('2012-10-01')
today <- as.Date('2013-06-10')
all_days <- seq(start, today, by = 'day')
year <- as.POSIXlt(all_days)$year + 1900
urls <- paste0('', year, '/', all_days, '.csv.gz')
# only download the files you don't have:
missing_days <- setdiff(as.character(all_days), tools::file_path_sans_ext(dir("CRANlogs"), TRUE))
for (i in 1:length(missing_days)) {
  print(paste0(i, "/", length(missing_days)))
  download.file(urls[i], paste0('CRANlogs/', missing_days[i], '.csv.gz'))


Step 2: Combine all daily files into one big data table (this steps also takes a couple of minutes…)

?View Code RSPLUS
## ======================================================================
## Step 2: Load single data files into one big data.table
## ======================================================================
file_list <- list.files("CRANlogs", full.names=TRUE)
logs <- list()
for (file in file_list) {
	print(paste("Reading", file, "..."))
	logs[[file]] <- read.table(file, header = TRUE, sep = ",", quote = "\"",
         dec = ".", fill = TRUE, comment.char = "",
# rbind together all files
dat <- rbindlist(logs)
# add some keys and define variable types
dat[, date:=as.Date(date)]
dat[, package:=factor(package)]
dat[, country:=factor(country)]
dat[, weekday:=weekdays(date)]
dat[, week:=strftime(as.POSIXlt(date),format="%Y-%W")]
setkey(dat, package, date, week, country)
save(dat, file="CRANlogs/CRANlogs.RData")
# for later analyses: load the saved data.table
# load("CRANlogs/CRANlogs.RData")


Step 3: Analyze it!

?View Code RSPLUS
## ======================================================================
## Step 3: Analyze it!
## ======================================================================
# Overall downloads of packages
d1 <- dat[, length(week), by=package]
d1 <- d1[order(V1), ]
d1[package=="TripleR", ]
d1[package=="psych", ]
# plot 1: Compare downloads of selected packages on a weekly basis
agg1 <- dat[J(c("TripleR", "RSA")), length(unique(ip_id)), by=c("week", "package")]
ggplot(agg1, aes(x=week, y=V1, color=package, group=package)) + geom_line() + ylab("Downloads") + theme_bw() + theme(axis.text.x  = element_text(angle=90, size=8, vjust=0.5))
agg1 <- dat[J(c("psych", "TripleR", "RSA")), length(unique(ip_id)), by=c("week", "package")]
ggplot(agg1, aes(x=week, y=V1, color=package, group=package)) + geom_line() + ylab("Downloads") + theme_bw() + theme(axis.text.x  = element_text(angle=90, size=8, vjust=0.5))

Here are my two packages, TripleR and RSA. Actually, ~30 downloads per week (from this single mirror) is much more than I’ve expected!Bildschirmfoto 2013-06-11 um 14.11.30


To put things in perspective: package psych included in the plot:

Bildschirmfoto 2013-06-11 um 14.11.43

Some psychological sidenotes on social comparisons:

  • Downward comparisons enhance well-being, extreme upward comparisons are detrimental. Hence, do never include ggplot2 into your graphic!
  • Upward comparisons instigate your achievement motive, and give you drive to get better. Hence, select some packages, which are slightly above your own.
  • Of course, things are a bit more complicated than that …

All source code on this post is licensed under the FreeBSD license.

To leave a comment for the author, please follow the link and comment on their blog: Nicebread » R. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.


Mango solutions

plotly webpage

dominolab webpage

Zero Inflated Models and Generalized Linear Mixed Models with R

Quantide: statistical consulting and training




CRC R books series

Six Sigma Online Training

Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)