- R code coverage support via docker
In my previous post , I used a R with built-in code coverage to compute the code coverage of some packages.
Today I will show you how to install and use such a R with code coverage support.
I just created a public docker container that provides this patched R. If you do not know Docker, it is an incredibly powerful way of deploying software and implementing reproducibility in software development.
Without docker, in order to add code coverage support to R, you would have to:
- get the appropriate source code tarball from the CRAN
- install for your platform all the required tools and libraries that you need to compile it.
- untar the R source code
- get the appropriate patch
- apply the patch on the source code
- configure the R installation
- compile R
- install it
[install docker if needed], and just type:
docker run -ti quartzbio/r-coverage
That's all folks.
This will download the container the first time from the docker hub, so it may take quite long. Then it will execute a patched R interpreter, with buit-in code coverage support.
So for example to investigate the code coverage of package stringr:
library(devtools) pkg <- download.packages('stringr', '.') untar(pkg, compressed=TRUE) Rcov_start() test('stringr') res <- as.list(Rcov_stop()) print(res)
Excerpt of the ouput:
$`/home/docker/stringr/R/vectorise.r` [,1] [,2] [1,] 5 1 [2,] 6 65 [3,] 10 65 [4,] 11 6 [5,] 12 6 [6,] 15 65 [7,] 18 1 [8,] 19 5 [9,] 26 5 [10,] 31 1 [11,] 32 94 [12,] 34 94 [13,] 35 94 [14,] 37 94 ...
This means for example that line #6 of stringr/R/vectorise.r has been executed (or exercised) 65 times by the execution of the tests.
For more examples please see the r-coverage-docker project page.
Motivation: Why code coverage matters ?
The code coverage is a development tool, that helps you figure out how well your tests cover their target source code.
It does not measure if your tests are good or make sense. But coverage measure is very important for code quality, because by virtue of Murphy's law, every untested line of code is a potential point of failure. Moreover the code coverage can help you detect and get rid of some dead code.
And at Quartz Bio we strongly believe in quality and reproducibility, that's why we got a ISO 9001 certification, which I believe is quite unusual for a bioinformatics startup !
The patch in itself is the first draft of a hack of the R interpreter that I did as a proof of concept. It was the first time I dared playing with the R internals, mostly black magic to me, so it's probably very naive and imperfect, but amazingly so far it worked perfectly for our needs.
Because I got no feedback, we just kept using it internally to improve the quality of our code. But as we use a lot docker internally for reproducibility reasons, I thought that it was an easy way to provide it to the community. But ultimately I'd like this feature to be improved and incorporated into R.
the missing parts
As you probably realized, the patched interpreter only provides part of the code coverage measure: the covered lines. But what is really interesting is:
- the percentage of code coverage (by source file, package)
- the missed lines ! We have of course implemented these features, but it is currently well buried in our internal packages. If there's some interest I could figure out a way to publish it.
Another missing part that I consider implementing would be a simple GUI (probably a shiny app, based on hadley's lineprof) for browsing source files and seeing missed lines.
- My previous post Test coverage of the 10 most downloaded R packages
- The docker hub r-coverage page: https://registry.hub.docker.com/u/quartzbio/r-coverage/
- The github r-coverage-docker project: https://github.com/quartzbio/r-coverage-docker
- The github r-coverage-patch project: https://github.com/quartzbio/r-coverage-patch