Testing, testing, testing!

[This article was first published on Learning Slowly » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R testthat unit tests with GitHub, Travis-CI continuous integration and the covr package for Coveralls code coverage.

I’ve been working pretty hard on getting the ggRandomForests package wrapped up so I can work on some other projects that have as much or more potential impact. This is my second CRAN package, and I’ve learned a lot about R programming, with loads of help from the hadleyverse.

I’ve come to statistics from an engineering background, and R is not my first language. I’m familiar with unit testing, what it is and how it’s supposed to work. The advantages of writing tests first seem to be enormous, but I have not really been able to get to the “nuts and bolts” of it. How do I apply this to my work specifically?

So, I started by starting! I went through my ggRandomForests package and wrote a series of testthat tests to just make sure objects belonged to the correct class. At JSM!2014, I cornered Hadley Wickham got him to help me get the tests to run on R CMD CHECK and with devtools test() function the way I expected. So I was off and running.

Except that I really didn’t get the part about “write a test, then code to the test” part. So my test framework languished.

On Monday, I learned about the covr package. I’ve tried code coverage in the past, also without much luck. Getting things instrumented and then figuring it all out was a huge amount of work in C/C++. I’ve also briefly tried some of the R code coverage tools. The covr package has the advantage of putting a bunch of tools into the mix to make the whole toolchain work. And I am benefiting from the process.

I’ll try to briefly describe what I’ve done to implement code coverage to improve my testthat tests and hopefully make my package more stable now and as I add more features in the future.

Code coverage using the covr will require three web based tools:

  • GitHub will host your R code. Hadley covers version control, and using GitHub in his R Packages book. Look at the Git and GitHub chapter.
  • I was already using Travis-CI for continuous integration. You set up this site to watch your GitHub repository, and test your package at every git commit. I’ve caught some silly dependency bugs with this, because I’m using development versions of R packages that are not widely available yet. I haven’t figured out PackRat yet, and I’m not convinced that is the correct solution to this particular problem.

  • Coveralls is the last piece. You set up Coveralls to watch the Travis-CI build, and it will generate a report showing what lines of code you’ve actually hit… with your testthat tests.


If you’re not using GitHub for your package development, I suggest you start. You’ll need this account to start. Create a repository for each package. Then Commit early and often. Sit back and watch the fireworks.

My workflow is to develop and write during the day, and commit changes only when I have a clean R CMD CHECK.

Travis CI

I don’t remember how I found Travis-CI, though I’m going to guess it was through either watching Hadley’s Github traffic or reading a retweet of his. Either way, setup was a breeze when I found this GitHub wiki (https://github.com/craigcitro/r-travis/wiki)

You create an account with your GitHub account. You’ll see a list of all your GitHub repos, and you select which ones you want to test continuously.

For R packages, you add .travis.yml file to your package root that tells Travis-CI how to test your package. This is (mostly) mine from the ggRandomForests package. I basically copy and pasted the default from the wiki page.

# Sample .travis.yml for R projects.
# See README.md for instructions, or for more configuration options,
# see the wiki:
#   https://github.com/craigcitro/r-travis/wiki

language: c

# For code coverage
  - curl -OL http://raw.github.com/craigcitro/r-travis/master/scripts/travis-tool.sh
  - chmod 755 ./travis-tool.sh
  - ./travis-tool.sh bootstrap
  - ./travis-tool.sh install_deps

script: ./travis-tool.sh run_tests

  - ./travis-tool.sh dump_logs

    on_success: change
    on_failure: change

If everything is in order, you’ll now get an email on git push to GitHub whenever the Travis-CI status changes.

You can also watch as Travis-CI does it’s testing by going to the website. You’ll see why your code doesn’t build as well as other diagnostics. And you can add a nice badge to your README.md file which will be displayed on your repo GitHub landing page. The badge is updated in real time… here’s my ggRandomForests badge:
Build Status

Clicking the badge will take you to the Travis-CI build page for the repo. But come back, there’s more!


I was happy with everything, until Jim Hester posted a devtools issue about a pull request for a use_covr() function (use_coveralls() maybe?).

So I clicked through to the covr GitHub page. The README.md file is really short, with simple instructions to get this up and running. So easy, I had to do it right then.

Basically, repeat the Travis-CI setup I did at the Coveralls website (https://coveralls.io/repos/new) and then add 2 lines into my .travis.yml file.

  - ./travis-tool.sh github_package jimhester/covr

  - Rscript -e 'covr::coveralls()'

This instructs Travis-CI to install the latest covr package from GitHub before running the tests, then run the covr::coveralls() function to get the data over to (https://coveralls.io/).

And then you can get a new badge, from coveralls!
Coverage Status

It’s like collecting stickers!

Now what?

OK, so at writing, the ggRandomForests code coverage was at 75%. Two days ago, it was at 43%, so I’m pretty pleased with this. How did I get this improvement? By writing better testthat scripts.

I thought about screen shots, but this is long enough already. So, click on the Coverage badge, and it will take you to my Coveralls stats page for the R package (It will direct you away from this page by default). What you see is a history of all the builds I’ve done since I started the Coveralls process. The coverage badge will have an icon indicating how the coverage changed, and what the current percentage of code coverage is.

It took me a little bit to figure out that if you click on the commit message, you’ll see the report that can really help. This page lists the code coverage for each file in your R code directory. If you click on a file, you can see which lines your testthat tests actually hit.

The hard part is to figure out what test you’ll need to add to get your Coveralls number to improve. My particular case required sending in some bad objects, removing some code that I no longer needed and just thinking about what the code was supposed to be doing. I spent about a day really working on tightening up my tests, and I gained a significant increase in test coverage.

If you look at my pages, you’ll see that I have one file that has 0 coverage. That file is probably taking my coverage down from somewhere close to 90%. I am consciously choosing to not test that particular function for two reasons.

1.The function is time intensive. I think it takes about 20-40 minutes to run on a reasonable machine.
2. The function doesn’t test my package, but could be used for a test of the randomForestSRC package I depend on.

I wrote the function to make my life easier. I distribute it in case users want to create their own cached versions of randomForestSRC objects. If I remove the function, my stat goes up, but for the time being I’m OK with the lower number. At least I know why the number is what it is.

I’ll add one more link to Hadley’s books: To really improve testthat tests, I’m going to be returning to (http://r-pkgs.had.co.nz/tests.html). Just because I’ve gotten this far doesn’t mean I’m really at “best practices.”

Filed under: R Tagged: Coveralls, covr, devtools, ggRandomForests, git, github, R, testhat, Travis-CI

To leave a comment for the author, please follow the link and comment on their blog: Learning Slowly » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)