R code coverage support via docker

December 17, 2014
By

(This article was first published on R2D2, and kindly contributed to R-bloggers)

  • R code coverage support via docker

  • 2014-12-17
  • Source

In my previous post
, I used a R with built-in code coverage to compute the code coverage of some packages.

Today I will show you how to install and use such a R with code coverage support.

I just created a public docker container that provides this patched R.
If you do not know Docker, it is an incredibly powerful way of deploying
software and implementing reproducibility in software development.

Without docker, in order to add code coverage support to R,
you would have to:

  • get the appropriate source code tarball from the CRAN
  • install for your platform all the required tools and libraries that you need to compile it.
  • untar the R source code
  • get the appropriate patch
  • apply the patch on the source code
  • configure the R installation
  • compile R
  • install it

with docker

[install docker if needed], and just type:

docker run -ti quartzbio/r-coverage

That's all folks.

This will download the container the first time from the docker hub, so it may
take quite long. Then it will execute a patched R interpreter, with buit-in
code coverage support.

So for example to investigate the code coverage of package stringr:

library(devtools)
pkg <- download.packages('stringr', '.')
untar(pkg[2], compressed=TRUE)
Rcov_start()
test('stringr')
res <- as.list(Rcov_stop())
print(res)

Excerpt of the ouput:

$`/home/docker/stringr/R/vectorise.r`
      [,1] [,2]
 [1,]    5    1
 [2,]    6   65
 [3,]   10   65
 [4,]   11    6
 [5,]   12    6
 [6,]   15   65
 [7,]   18    1
 [8,]   19    5
 [9,]   26    5
[10,]   31    1
[11,]   32   94
[12,]   34   94
[13,]   35   94
[14,]   37   94
...

This means for example that line #6 of stringr/R/vectorise.r has been
executed (or exercised) 65 times by the execution of the tests.

For more examples please see the r-coverage-docker
project page.

Motivation: Why code coverage matters ?

The code coverage is a development tool, that helps you figure out how well
your tests cover their target source code.

It does not measure if your tests are good or make sense.
But coverage measure is very important for code quality, because
by virtue of Murphy's law, every untested line of code is a potential point of failure.
Moreover the code coverage can help you detect and get rid of some dead code.

And at Quartz Bio we strongly believe in quality and reproducibility,
that's why we got a ISO 9001 certification, which I believe is quite unusual for
a bioinformatics startup !

Implementation

The patch in itself is the first draft of a hack of the R interpreter that I
did as a proof of concept. It was the first time I dared playing with the R
internals, mostly black magic to me, so it's probably very naive and
imperfect, but amazingly so far it worked perfectly for our needs.

r-coverage docker

Because I got no feedback, we just kept using it internally to improve the quality of
our code. But as we use a lot docker internally for reproducibility reasons,
I thought that it was an easy way to provide it to the community.
But ultimately I'd like this feature to be improved and incorporated into R.

the missing parts

As you probably realized, the patched interpreter only provides part of the code
coverage measure: the covered lines. But what is really interesting is:

  • the percentage of code coverage (by source file, package)
  • the missed lines !
    We have of course implemented these features, but it is currently well buried
    in our internal packages. If there's some interest I could figure out a way
    to publish it.

Another missing part that I consider implementing would be a simple GUI
(probably a shiny app, based on hadley's lineprof)
for browsing source files and seeing missed lines.

Resources

To leave a comment for the author, please follow the link and comment on their blog: R2D2.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)