Site icon R-bloggers

CRAN download statistics of any packages #rstats

[This article was first published on Strenge Jacke! » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Hadley Wickham announced at Twitter that RStudio now provides CRAN package download logs. I was wondering about the download numbers of my package and wrote some code to extract that information from the logs…

The first code snippet is taken from the log website itself:

# Here's an easy way to get all the URLs in R
start <- as.Date('2013-11-28')
today <- as.Date('2015-03-04')

all_days <- seq(start, today, by = 'day')

year <- as.POSIXlt(all_days)$year + 1900
urls <- paste0('http://cran-logs.rstudio.com/', year, '/', all_days, '.csv.gz')

Then I downloaded all files into a folder:

for (i in 1:length(urls)) {
  download.file(urls[i], sprintf("~/Desktop/rstats/temp%i.csv.gz", i))
}

Unzipping did not work with unzip, so I just “opened” all files with the OS X unarchiver, which was quite convenient.

Than I read all csv-files and extracted the information for my package, sjPlot, from each csv-file and merged everything into one data frame:

sjPlot.df <- data.frame()
library(dplyr)
pb <- txtProgressBar(min=0, max=length(urls), style=3)

for (i in 1:length(urls)) {
  df.csv <- read.csv(sprintf("~/Desktop/rstats/temp%i.csv", i))
  pack <- tolower(as.character(df.csv$package))
  my.package <- which(pack == "sjplot")
  if (length(my.package) > 0 ) {
    dummy.df <- df.csv %>% dplyr::slice(my.package) %>% dplyr::select(date, package, version, country)
    sjPlot.df <- dplyr::bind_rows(sjPlot.df, dummy.df)
  }
  setTxtProgressBar(pb, i)
}
close(pb)
sjPlot.df$date.short <- strftime(sjPlot.df$date, format="%Y-%m")

Finally, the download-stats as plot:

library(sjPlot)
library(ggplot2)

mydf <- sjPlot.df %>% dplyr::count(date.short)

sjp.setTheme(theme = "539", axis.angle.x = 90)
ggplot(mydf, aes(x = date.short, y = n)) +
  geom_bar(stat = "identity", width = .5, alpha = .5, fill = "#3399cc") +
  scale_y_continuous(expand = c(0, 0), breaks = seq(250, 1500, 250)) +
  labs(x = sprintf("Monthly CRAN-downloads of sjPlot package since first release until 4th March (total download: %i)", sum(mydf$n)), y = NULL)

By the way, there’s already a shiny app for this…


Tagged: data visualization, ggplot, R, rstats, sjPlot

To leave a comment for the author, please follow the link and comment on their blog: Strenge Jacke! » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.