Test coverage of the 10 most downloaded R packages

May 2, 2014

(This article was first published on R2D2, and kindly contributed to R-bloggers)

  • Test coverage of the 10 most downloaded R packages

  • 2014-04-30
  • Source


How do you know that your code is well tested ?

The test coverage is the proportion of source code lines that are executed
(covered) when running the tests. It is useful to find the parts of your code
that are no exercised no matter how many test you add. It can also prove useful to
spot dead code.

I implemented of proof of concept of code coverage, as a patch for R-3.0.2.
To test i, I tried to evaluate the test coverage of the 10 most downloaded R packages
in 2014 on the rstudio CRAN mirror (I got the data using the installr package).


These results are to be taken with caution:

  • my code coverage is not perfect, as I will explain later.
  • I could not always run the package tests in a setting equivalent to
    R CMD check. I need to load the package from source using devtools::load_all
    before running the tests. For packages using testthat for their test implementation
    it is trivial, but some use RUnit, some use simple scripts in tests/ etc…


plot of chunk plot

pkg nb_srcs covered lines coverage
colorspace 4 0 694 0
digest 3 83 89 93
ggplot2 163 1432 3769 38
plyr 66 674 903 75
RColorBrewer 1 0 57 0
Rcpp 18 394 1018 39
reshape2 11 169 190 89
scales 25 123 300 41
stringr 18 169 202 84
zoo 33 670 1614 42

We can note that packages digest, plyr, reshape2 and stringr are pretty well
covered by their tests.
colorspace and RColorBrewer do not contain any tests.
Rcpp has a lot of tests, but there are two main problems:

  • most of the code is in c++, so it is (currently) not coverable .
  • I quickly hacked the RUnit test script to be able to run it from the source
    package, but my hack is far from correct and I noticed a lot of tests failures,
    so that the actual R code coverage is probably much higher.

getting deeper

Looking at digest, that has a high coverage of its R code (note most of its code is
written in C and as such not covered nor coverable), one can wonder how to improve
its test coverage.
The test coverage output (not shown) reveals that lines 13:14,29,36,56,59
are missed for source file digest.R.
Here is an excerpt of lines 5:15.

 digest <- function(object, algo=c("md5", "sha1", "crc32", "sha256", "sha512"),
                    serialize=TRUE, file=FALSE, length=Inf,
                    skip="auto", ascii=FALSE, raw=FALSE) {
   algo <- match.arg(algo)
   if (is.infinite(length)) {
     length <- -1               # internally we use -1 for infinite len

   if (is.character(file) && missing(object)) {
     object <- file  # line 13
     file <- TRUE    # line 14

The missed (non-covered) lines correspond to the case where no object is given
but a filename. To improve the tests, we just have to add a test for this use case.

the test coverage current implementation

What I call the current test coverage implementation is actually just a proof of concept,
it is a patch for R-3.0.1 and R-3.0.2, that I submitted to the R-devel list, and I must confess
that I got absolutely no feedback.

I have no experience nor understanding of the internals of R. I had the idea of
this hack trying to understand how the R profiler is implemented.
It works by hooking in the R evaluation engine.
It is not perfect, for instance functions whose body is not enclosed by braces are
not covered, i.e this function:

not_covered <- function(x) x *x 

will not be covered, but this one will:

covered <- function(x) { x * x }

The overhead should be negligible when the coverage is not activated (by calling
Rcov_start()), and is very low when activated.

The patch will provide the listing (and frequency) of the lines of code executed.
But to be able to compute the coverage rate we also need to compute the coverable
lines, this is currently done using parse() and getParseData(),
and replicating the behaviour of the code coverage patch.

the future

We are successfully using this patch to increase the testing of our
internal packages and as such of our services.

But once again this is a only proof of concept. I would like to contribute this feature to the
community if someone manifests some interest. Having this feature as a patch is far from
optimal, it may break for any new version of R, and is not really tested and reviewed.
I would really appreciate advice and proper guidance to move on.

I believe that test coverage is a useful tool to improve the quality of software.


To leave a comment for the author, please follow the link and comment on their blog: R2D2.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)