R’s Testing Predicament

October 10, 2016
By

(This article was first published on Mango Solutions » R Blog, and kindly contributed to R-bloggers)

Joe Russell

If you were to ask any R-user for the reason for R’s success, you’re almost guaranteed to hear the words “open source”.  As the second most popular open source language (behind Python) R has exploded in popularity in recent years. This, however, brings with it challenges that must be addressed.

Here at Mango we have written our own unit tests for many packages and bundled them together to create what we call ValidR. Essentially it creates a single fully tested instance of R, enabling R to be used in companies with strict regulatory requirements. It is, therefore, especially popular within the pharmaceutical industry.

As a part of the ValidR process, we like to look at the tests written by package authors themselves. Through some simple coding in R (which I have included below), we find that the number of packages currently available as downloads from CRAN sits at a bewildering 9,231. Delving a little deeper, we find that these have been created by an equally bewildering 7,882 unique authors. It is impressive that CRAN is able to ensure that all of these authors are writing their packages accurately and consistently maintaining them, let alone including sufficient testing.

Unfortunately, this is not the case. If we search, using some beloved dplyr, for packages in CRAN that incorporate some form of unit testing, we come to a shockingly low figure of just 17%.  For a coding language that is the predominant choice of users with statistical backgrounds, perhaps this is not surprising and perhaps something that many of us do not deem concerning. However this is something that will make people believe that R cannot be trusted, harming its success and, inevitably, our own as R users. This is an issue that must be resolved, so what is being done?

Gabor Csardi, one of the Senior Consultants here at Mango, is currently writing a package named ‘goodPractice’ (a work in progress, but available on GitHub if you want to take a look). This package is designed to advise package authors on how best to write their code, whether this is certain functions that shouldn’t be used or syntax best avoided.

Ideally, projects such as goodPractice and ValidR would not be necessary, and hopefully in the future they will become obsolete. However, for now they serve a key role in ensuring R’s long term success and pave a way for improving R’s testing standards.


# Script counts the number of packages on CRAN that use a formal unit testing framework
library(dplyr)

# What's on CRAN?
download.file("http://cran.R-project.org/web/packages/packages.rds",
"packages.rds", mode="wb")
cranPacks <- readRDS("packages.rds")

#Number of unique package authors
authors <- cranPacks[,17]
numAuthors <- length(unique(authors))

# Reduce the size down a bit to keep just the interesting columns
cranPacks <- as.data.frame(cranPacks)[, 1:7]

# Just the packages that use the formal testing framework
unitTestFramework <- cranPacks %>%
filter(grepl("testthat", Depends) | grepl("testthat", Imports) | grepl("testthat", Suggests) |
grepl("RUnit", Depends) | grepl("RUnit", Imports) | grepl("RUnit", Suggests) |
grepl("svUnit", Depends) | grepl("svUnit", Imports) | grepl("svUnit", Suggests) |
grepl("testit", Depends) | grepl("testit", Imports) | grepl("tesit", Suggests) |
Package %in% c("testthat", "RUnit", "svUnit", "testit"))

# Proportion of packages that use a testing framework
nrow(unitTestFramework) / nrow(cranPacks)

## [1] 0.1712033

 

To leave a comment for the author, please follow the link and comment on their blog: Mango Solutions » R Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)