Developing R Packages with usethis and GitLab CI: Part I

[This article was first published on Rstats on pi: predict/infer, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The best way to share your R code with others is to create a package. Whether you want to share your functions with team members, clients, or all interested R users, bundling up your functions into a package is the way to go. Luckily, there are great tools available that make this process relatively smooth and easy. This series of posts aims to walk through the process of setting up an R package and sharing it on the version control code repository, GitLab. This first post will focus solely on building an R package with usethis. The following posts will go into the details of sharing the package on GitLab and taking advantage of it’s built-in continuous integration services to automate testing of the package.

Setting up with usethis

Suppose we have written a function that calculates a p-value from either a Chi-squared or a Fisher Exact test, depending on if a warning is thrown from the Chi-squared test due to small expected counts. We think this is a pretty useful function, so we would like to make it available for others to use. Let’s make a package for it. We’ll name the package chifishr.

To help with the setup, we will utilize the usethis package. usethis was spun out of the devtools package, and was created specifically to automate the tasks required to setup the common components of R packages. It takes care of getting the infrastructure of the package in place, so you can focus your efforts on creating your functions, examples and tests.

Open RStudio and run usethis::create_package, usethis::use_package_doc, and usethis::use_roxygen_md to get the bare-bones structure and documentation of the package in place.

install.packages("usethis")

usethis::create_package("~/gitlab/chifishr")
#> Changing active project to chifishr
#> ✔ Creating 'R/'
#> ✔ Creating 'man/'
#> ✔ Writing 'DESCRIPTION'
#> ✔ Writing 'NAMESPACE'
#> ✔ Writing 'chifishr.Rproj'
#> ✔ Adding '.Rproj.user' to './.gitignore'
#> ✔ Adding '^chifishr\\.Rproj$', '^\\.Rproj\\.user$' to '.Rbuildignore'
#> ✔ Opening project in RStudio

usethis::use_package_doc()
#> ✔ Writing 'R/chifishr-package.R'

usethis::use_roxygen_md()
#> ✔ Setting Roxygen field in DESCRIPTION to 'list(markdown = TRUE)'
#> ✔ Setting RoxygenNote field in DESCRIPTION to '6.0.1'
#> ● Re-document

Edit the DESCRIPTION file to add details about the package, including the title, description, author and R version dependency. To open the file for editing, run usethis:::edit_file("DESCRIPTION").

Package: chifishr
Version: 0.0.0.9000
Title: Helpers for Calculating Chi-squared and Fisher Exact Test p-values
Description: This package contains helper functions for calculating p-values from Chi-squared or Fisher exact test, depending on if a warning is thrown from the Chi-Squared test due to small expected counts leading to poor p-value approximations.
Authors@R: person("Caleb", "Scheidel", , "[email protected]", c("aut", "cre"))
License: What license it uses
Encoding: UTF-8
LazyData: true
ByteCompile: true
Roxygen: list(markdown = TRUE)
RoxygenNote: 6.0.1
Depends:
  R (>= 2.10)

It is good practice to add a software license for the package if it is planning on being shared. The MIT open source license is simple and permissive, and is a very commonly used license for R packages. Let’s use that here.

usethis::use_mit_license("Caleb Scheidel")
#> ✔ Setting License field in DESCRIPTION to 'MIT + file LICENSE'
#> ✔ Writing 'LICENSE.md'
#> ✔ Adding '^LICENSE\\.md$' to '.Rbuildignore'
#> ✔ Writing 'LICENSE'

The package we are creating will depend on the dplyr and purrr packages. We’ll need to add these dependencies to the DESCRIPTION file. This can easily be done with the usethis::use_package function.

usethis::use_package("dplyr")
#> ✔ Adding 'dplyr' to Imports field in DESCRIPTION
#> ● Refer to functions with `dplyr::fun()`

usethis::use_package("purrr")
#> ✔ Adding 'purrr' to Imports field in DESCRIPTION
#> ● Refer to functions with `purrr::fun()`

We will also utilize the pipe (%>%) function from the magrittr package. We can add this dependency with usethis::use_pipe.

usethis::use_pipe()
#> ✔ Adding 'magrittr' to Imports field in DESCRIPTION
#> ✔ Writing 'R/utils-pipe.R'
#> ● Run `document()`

Adding a function

Now we can add our function to the chi_fisher_p.R script in the R/ directory. First create the script with usethis::use_r.

usethis::use_r("chi_fisher_p")

Then add the function to that file, along with the necessary roxygen documentation. roxygen generates .Rd documentation files, which give users of the package the ability to view the arguments and returned value of the functions, among other details.

# chi_fisher_p.R

#' Function which calculates p-value via Chi-square or Fisher exact test.
#'
#' @param tbl (`tbl`) Dataframe that has variable and treatment columns of interest
#' @param var (`character`) Name of variable column
#' @param treatment (`character`) Name of treatment column
#'
#' @return (`numeric`) p-value
#'
#' @examples
#'
#' chi_fisher_p(treatment, "outcome", "treatment")
#' chi_fisher_p(treatment, "gender", "treatment")
#'
#' @export
chi_fisher_p <- function(tbl, var, treatment) {
  
  chisq_wrapper <- function(tbl, var, treatment) {
    
    var       <- tbl %>% dplyr::pull(var) %>% as.factor()
    treatment <- tbl %>% dplyr::pull(treatment) %>% as.factor()
    
    p <- stats::chisq.test(var, treatment)$p.value
    return(p)
  }

  fisher_wrapper <- function(tbl, var, treatment) {
    
    var       <- tbl %>% dplyr::pull(var) %>% as.factor()
    treatment <- tbl %>% dplyr::pull(treatment) %>% as.factor()
    
    p <- stats::fisher.test(var, treatment)$p.value
    return(p)
  }

  chisq_wrapper <- purrr::quietly(chisq_wrapper)
  chisq <- chisq_wrapper(tbl, var, treatment)

  if (length(chisq$warnings) == 0) {
    return(chisq$result)
  } else {
    return(fisher_wrapper(tbl, var, treatment))
  }

}

Adding test data

To test this function, we will create a fake data set. The data set will have 100 observations and 3 variables: treatment, gender, and outcome. The suggested practice is to include the data generating scripts in the package repository. To help set this up, run usethis::use_data_raw().

usethis::use_data_raw()
#> ✔ Creating 'data-raw/'
#> ✔ Adding '^data-raw$' to '.Rbuildignore'
#> Next:
#> ● Add data creation scripts in 'data-raw'
#> ● Use usethis::use_data() to add data to package

Then create the R script that will generate the data, run it locally and add it to /data-raw.

# treatment-data.R

treatment <- tibble::tibble(
  treatment = c(rep("old", 50), rep("new", 50)),
  gender    = c(rep("male", 30), rep("female", 20), rep("male", 20), rep("female", 30)),
  outcome   = c(rep("failure", 95), rep("success", 5))
)

Note that the outcome is rare (5% success). If outcome is used as a variable in chisq.test, a warning will result. To include this data set in the package, we can run usethis::use_data().

usethis::use_data(treatment)
#> ✔ Creating 'data/'
#> ✔ Saving treatment to data/treatment.rda

Since this data will be accessible to users of the package, it must be documented. To do this, we will document the name of the data set and save it in the R/data.R script.

First create the script.

usethis::use_r("data")
#> ● Modify 'data.R'

Then add the documentation for the treatment data set to that script.

#' Outcomes of 100 patients by old and new treatments
#'
#' A dataset containing the genders and outcomes of two
#' treatment groups of 100 patients.
#'
#' @format A data frame with 100 rows and 3 variables:
#'  - *treatment*: treatment, old or new
#'  - *treatment*: gender, male or female
#'  - *outcome*: outcome, failure or success
"treatment"

To ensure the function and the data set we just created have the proper .Rd documentation files within the package, run devtools::document().

devtools::document()
#> Updating chifishr documentation
#> Loading chifishr
#> Writing NAMESPACE
#> Writing chi_fisher_p.Rd
#> Writing chifishr-package.Rd
#> Writing treatment.Rd
#> Writing pipe.Rd

Adding tests

The function we just created needs to be tested to ensure that it is performing how we are expecting. To set up the file structure for writing and executing tests, run usethis::use_testthat. testthat is an extremely helpful toolset for setting up and running tests within a package.

usethis::use_testthat()
#> ✔ Adding 'testthat' to Suggests field in DESCRIPTION
#> ✔ Creating 'tests/testthat/'
#> ✔ Writing 'tests/testthat.R'

Now we can add some tests to the tests/testthat/ directory. If you have the chi_fisher_p.R script open in RStudio and run usethis::use_test(), it will create a test file corresponding to that script that you can put the related tests in.

usethis::use_test()
#> ✔ Writing 'tests/testthat/test-chi_fisher_p.R'
#> ● Modify 'test-chi_fisher_p.R'

Using known outcomes from chisq.test in our example treatment data, we can then write tests to check that chi_fisher_p returns a Chi-squared p-value when a warning is thrown from chisq.test, and returns a Fisher exact test otherwise. This can be done using the expect_ family of functions from testthat.

# test-chi_fisher_p.R

context("test-chi_fisher_p.R")

test_that("returns chi-squared p value if no warnings are thrown", {
  expect_silent(chisq.test(treatment$gender, treatment$treatment))
  expect_equal(chi_fisher_p(treatment, "gender", "treatment"), chisq.test(treatment$gender, treatment$treatment)$p.value)
})

test_that("returns fisher p value if chi-squared warnings are thrown", {
  expect_warning(chisq.test(treatment$outcome, treatment$treatment))
  expect_equal(chi_fisher_p(treatment, "outcome", "treatment"), fisher.test(treatment$outcome, treatment$treatment)$p.value)
})

We know these tests will pass right now, but the tests are important to make sure any changes made to the package in the future do not break the basic functionality of chi_fisher_p. We can run these tests with devtools::test(), using the keyboard shortcut Cmd+Shift+T (Mac) or Ctrl+Shift+T (Windows/Linux), or using the RStudio Test button in the “Build” pane:

> devtools::test()

Loading chifishr
Loading required package: testthat
Testing chifishr
✔ | OK F W S | Context
✔ |  4       | test-chi_fisher_p.R [0.1 s]

══ Results ══════════════════════════════════════════════════════════════════════════════════════════════════════════
Duration: 0.2 s

OK:       4
Failed:   0
Warnings: 0
Skipped:  0

This runs the testthat.R script, and subsequently the test-chi_fisher_p.R script and reports the results. Any failing tests or warnings thrown from the tests would be shown in the output above.

Checking the package

Now that the first version of the package is nearly complete, we will want to “check” the package for any missing documentation or errors in file structures, as well as run the tests for the function. We can do all of this by running devtools::check(). Alternatively, you could use the keyboard shortcut Cmd+Shift+E (Mac) or Ctrl+Shift+E (Windows/Linux) or use the RStudio Check button in the “Build” pane:

> devtools::check()
Updating chifishr documentation
Loading chifishr
Setting env vars ---------------------------------------------------------------------------------------------------
CFLAGS  : -Wall -pedantic
CXXFLAGS: -Wall -pedantic
Building chifishr --------------------------------------------------------------------------------------------------
'/Library/Frameworks/R.framework/Resources/bin/R' --no-site-file --no-environ --no-save --no-restore --quiet CMD  \
  build '/Users/calebscheidel/gitlab/chifishr' --no-resave-data --no-manual 

* checking for file ‘/Users/calebscheidel/gitlab/chifishr/DESCRIPTION’ ... OK
* preparing ‘chifishr’:
* checking DESCRIPTION meta-information ... OK
* checking for LF line-endings in source and make files and shell scripts
* checking for empty or unneeded directories
* looking to see if a ‘data/datalist’ file should be added
* building ‘chifishr_0.0.0.9000.tar.gz’

Setting env vars ---------------------------------------------------------------------------------------------------
_R_CHECK_CRAN_INCOMING_ : FALSE
_R_CHECK_FORCE_SUGGESTS_: FALSE
Checking chifishr --------------------------------------------------------------------------------------------------
'/Library/Frameworks/R.framework/Resources/bin/R' --no-site-file --no-environ --no-save --no-restore --quiet CMD  \
  check '/var/folders/nb/q748g4nn7mvg0b3v73w7mq180000gq/T//Rtmp6x4G4e/chifishr_0.0.0.9000.tar.gz' --as-cran  \
  --timings --no-manual 

* using log directory ‘/private/var/folders/nb/q748g4nn7mvg0b3v73w7mq180000gq/T/Rtmp6x4G4e/chifishr.Rcheck’
* using R version 3.5.0 (2018-04-23)
* using platform: x86_64-apple-darwin15.6.0 (64-bit)
* using session charset: UTF-8
* using options ‘--no-manual --as-cran’
* checking for file ‘chifishr/DESCRIPTION’ ... OK
* this is package ‘chifishr’ version ‘0.0.0.9000’
* package encoding: UTF-8
* checking package namespace information ... OK
* checking package dependencies ... OK
* checking if this is a source package ... OK
* checking if there is a namespace ... OK
* checking for executable files ... OK
* checking for hidden files and directories ... OK
* checking for portable file names ... OK
* checking for sufficient/correct file permissions ... OK
* checking serialization versions ... OK
* checking whether package ‘chifishr’ can be installed ... OK
* checking installed package size ... OK
* checking package directory ... OK
* checking DESCRIPTION meta-information ... OK
* checking top-level files ... OK
* checking for left-over files ... OK
* checking index information ... OK
* checking package subdirectories ... OK
* checking R files for non-ASCII characters ... OK
* checking R files for syntax errors ... OK
* checking whether the package can be loaded ... OK
* checking whether the package can be loaded with stated dependencies ... OK
* checking whether the package can be unloaded cleanly ... OK
* checking whether the namespace can be loaded with stated dependencies ... OK
* checking whether the namespace can be unloaded cleanly ... OK
* checking dependencies in R code ... OK
* checking S3 generic/method consistency ... OK
* checking replacement functions ... OK
* checking foreign function calls ... OK
* checking R code for possible problems ... OK
* checking Rd files ... OK
* checking Rd metadata ... OK
* checking Rd line widths ... OK
* checking Rd cross-references ... OK
* checking for missing documentation entries ... OK
* checking for code/documentation mismatches ... OK
* checking Rd \usage sections ... OK
* checking Rd contents ... OK
* checking for unstated dependencies in examples ... OK
* checking contents of ‘data’ directory ... OK
* checking data for non-ASCII characters ... OK
* checking data for ASCII and uncompressed saves ... OK
* checking examples ... OK
* checking for unstated dependencies in ‘tests’ ... OK
* checking tests ...
  Running ‘testthat.R’
 OK
* DONE

Status: OK

R CMD check results
0 errors | 0 warnings | 0 notes

You can see that devtools::check() does quite a bit. First it re-documents the package, checks the DESCRIPTION file and builds the package into a .tar.gz file. Then it runs through a number of checks, including if the package can be installed, if all of the required dependencies are listed, if there are any syntax errors and if all of the proper documentation files in place, among a number of other things. Finally, it runs the testthat.R script, which executes the tests in the testthat directory. If any issues arose from any of the checks or tests, they would show up in the results as either an error, warning or note. More details on devtools::check() and the different types of messages returned from it can be found here.

If you have followed the above steps, you should see that everything has passed without errors, warnings or notes. The first version of the package is now complete and ready to be shared!

Up Next

Part II will demonstrate how to share the package on GitLab, as well as setup automated checking and testing with GitLab’s built-in CI services.

To leave a comment for the author, please follow the link and comment on their blog: Rstats on pi: predict/infer.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)