checks and {tiny}testing – a quick primer

[This article was first published on Data By John, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This material was presented to a meeting of KIND (Knowledge and Information Network) in April this year.

checks

  • What assumptions are you making about your data? (structure, names, types etc.)
  • function arguments
  • what users will and won’t do

tests

Describe what you expect your functions to do, and how they should behave with regards to user inputs

Checks : assertions
Tests : expectations

Let’s write a simple function that prints the name of a council area:

choose_council <- function(x){
  out <- paste("chosen council is", x)
  out
}

Now let’s try it out

choose_council("Highland")

[1] "chosen council is Highland"

choose_council("Argyll and Bute")

[1] "chosen council is Argyll and Bute"

choose_council("Bob")

[1] "chosen council is Bob"

choose_council(1)

[1] "chosen council is 1"

choose_council("Argyll & Bute")

[1] "chosen council is Argyll & Bute"

We can see the function works, but…

Base R functions

From the help:

match.arg matches a character arg against a table of candidate values as specified by choices.

To put that more simply, to use the function, we need to pass an argument, and a vector of possible choices. The function will then check that argument against the choices to see if there is a match.

Let’s assume we only want to print Highland and Argyll and Bute

How can we use match.arg?

choose_council <- function(council = c("Highland", 
                                       "Argyll and Bute")){

  council <-  match.arg(council)
  out <- paste("chosen council is", council)
  return(out)
}

choose_council("Highland")

[1] "chosen council is Highland"

choose_council("Argyll and Bute")

[1] "chosen council is Argyll and Bute"

choose_council("Bob")

Error in match.arg(council): 'arg' should be one of "Highland", "Argyll and Bute"

choose_council(1)

Error in match.arg(council): 'arg' must be NULL or a character vector

choose_council("Argyll & Bute")

Error in match.arg(council): 'arg' should be one of "Highland", "Argyll and Bute"

if no value supplied, match.arg uses the first element

choose_council() # match.arg uses default arguments

[1] "chosen council is Highland"

Partial matching is also possible - you can be lazy and only type the first few letters of your argument. This is OK for this very simple example, but not for real-life code - certainly not any code where you care about the results. (As an aside, if you regularly use T or F instead of TRUE and FALSE - you need to sort your life out)

This works, but .. careful now!

down with this kind of thing

choose_council("A") # partial matching - can be risky

[1] "chosen council is Argyll and Bute"

stopifnot

We saw that our function didn’t work when we supplied a number.

choose_council(1)

In this case, match.arg has it’s own checks in the background. But we can provide our own. We want to stop the function if a non character argument is provided.

We use stopifnot to trigger immediately if a non character argument is passed.

If a character argument is passed, we use the choices argument of match.arg to validate that this is an acceptable value

choose_council <- function(council){

  stopifnot(is.character(council))
  
   council <-  match.arg(council, 
                         choices = c("Highland", 
                                     "Argyll and Bute"))
  
  out <- paste("chosen council is", council)
  return(out)
}

choose_council(1)

Error in choose_council(1): is.character(council) is not TRUE

choose_council("Argyll & Bute")

Error in match.arg(council, choices = c("Highland", "Argyll and Bute")): 'arg' should be one of "Highland", "Argyll and Bute"

Yikes.

We can add friendlier messages

choose_council <- function(council){

stopifnot("council must be character" = is.character(council))
  
   council <-  match.arg(council, 
                         choices = c("Highland", 
                                     "Argyll and Bute"))
  
  out <- paste("chosen council is", council)
  return(out)
}

Partial matching works as before

choose_council("High")

[1] "chosen council is Highland"

But now we get a slightly more readable error message

choose_council(1)

Error in choose_council(1): council must be character

chi_check()

See phsmethods

What is a CHI number? The Community Health Index number is used in Scotland to uniquely identify patients.

What needs to be checked?

  • Does it contain no non-numeric characters?
  • Is it ten digits in length?
  • Do the first six digits denote a valid date?
  • Is the checksum digit correct?

We can deal with the first three quite quickly with the {checkmate} package

checkmate

“Virtually every standard type of user error when passing arguments into function can be caught with a simple, readable line which produces an informative error message.

A substantial part of the package was written in C to minimize any worries about execution time overhead.”

example CHI

x <- "0101011237"

is this a character vector?

check_class(x, "character")
checkClass(x, "character")

[1] TRUE

[1] TRUE

check_class and checkClass are exactly the same, simply choose whether you prefer snake_case or camelCase

Functions beginning with check return either TRUE, (as above) or, the error message

check_class(x, "integer")

[1] "Must inherit from class 'integer', but has class 'character'"

Functions beginning with assert either return an error message, or the checked object is returned invisibly:

assert_class(x, "integer")

Error in eval(expr, envir, enclos): Assertion on 'x' failed: Must inherit from class 'integer', but has class 'character'.

assert_class(x, "character")

Going back to the CHI example, we can use check_character for a more fine grained series of checks

check_character(x, n.chars = 10, pattern = "\\d{10}") # 10 chars, numeric only

[1] TRUE

x2 <- "010101123A"
x3 <- c(x, x2, NA)
x4 <- c(x, NA)

check_character(x2, n.chars = 10, pattern = "[^A-Z]{10}")
check_character(x, n.chars = 10, pattern = "[^A-Z]{10}")

[1] "Must comply to pattern '[^A-Z]{10}'"

[1] TRUE

# final version
check_character(x,
                min.len = 1,
                n.chars = 10,
                any.missing = FALSE,
                pattern = "\\d{10}")

vals <- c(x, x2, x3, x4)
cat(vals)
purrr::map_chr(vals,
               check_character,
               min.len = 1,
               n.chars = 10,
               any.missing = FALSE,
               pattern = "\\d{10}")

vals <- c(x, x2, x3, x4)
cat(vals)

0101011237 010101123A 0101011237 010101123A NA 0101011237 NA

purrr::map_chr(vals,
               check_character,
               min.len = 1,
               n.chars = 10,
               any.missing = FALSE,
               pattern = "\\d{10}")

[1] "TRUE"                                "Must comply to pattern '\\d{10}'"   
[3] "TRUE"                                "Must comply to pattern '\\d{10}'"   
[5] "Contains missing values (element 1)" "TRUE"                               
[7] "Contains missing values (element 1)"

# are first 6 elements a Date?
date_val <- substr(x,1,6)

cat(date_val)

checkDate(as.Date(strptime(date_val,"%d%m%y", "UTC")),
          lower = "1900-01-01",
          upper =  Sys.Date(),
          any.missing = FALSE,
          min.len = 1L)

010101

[1] TRUE

combine checks with the assert function

main_check <- function(x){
  assert(check_character(x,
                         min.len = 1,
                         n.chars = 10,
                         any.missing = FALSE,
                         pattern = "\\d{10}"),
         checkDate(as.Date(strptime(substr(x,1,6),"%d%m%y", "UTC")),
                   lower = "1900-01-01",
                   upper =  Sys.Date(),
                   any.missing = FALSE,
                   min.len = 1L),
         combine = "and")
}

out <- main_check(x)
out

[1] TRUE

for the lazy

  • qassert built in data types
  • qassertr lists and data frames
qassert(x,"S+[10,11)") # character, vector length 1, lower bound 10 and less than 11
qassert(x,"S+[10,10]") # also works, between 10 and 10 (inclusive)
# note difference in closing brackets
# character denoted by `s`
# no missing values denoted by UPPER CASE
# exact length of string 10 denoted by [10]

testing

we can use {tinytest} for some checks also

tinytest::expect_inherits(x, "character")

----- PASSED      : <-->
 call| tinytest::expect_inherits(x, "character") 

Normally we’d list some expectations Here’s a useless function that adds 2 to a given numerical value

add_two <- function(x) {

  if (is.character(x)) {
  stop("You've passed a character vector.\nGonnae no' dae that? \nIt should be an integer or double")
  }

  checkmate::assert_count(x)
  checkmate::assert_integerish(x)
  !checkmate::anyMissing(x)

  x <- x + 2
  message("ya wee beauty!")
  return(x)

}

using("checkmate")
# test add_two works

expect_equal(1 + 2, add_two(1))

ya wee beauty!

----- PASSED      : <-->
 call| expect_equal(1 + 2, add_two(1)) 

add_two("one")

Error in add_two("one"): You've passed a character vector.
Gonnae no' dae that? 
It should be an integer or double

expect_error(add_two("one"))

----- PASSED      : <-->
 call| expect_error(add_two("one")) 

See also

To leave a comment for the author, please follow the link and comment on their blog: Data By John.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)