checks and {tiny}testing – a quick primer
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
This material was presented to a meeting of KIND (Knowledge and Information Network) in April this year.
checks
- What assumptions are you making about your data? (structure, names, types etc.)
- function arguments
- what users will and won’t do
tests
Describe what you expect your functions to do, and how they should behave with regards to user inputs
Checks : assertions
Tests : expectations
Let’s write a simple function that prints the name of a council area:
choose_council <- function(x){
out <- paste("chosen council is", x)
out
}
Now let’s try it out
choose_council("Highland")
[1] "chosen council is Highland"
choose_council("Argyll and Bute")
[1] "chosen council is Argyll and Bute"
choose_council("Bob")
[1] "chosen council is Bob"
choose_council(1)
[1] "chosen council is 1"
choose_council("Argyll & Bute")
[1] "chosen council is Argyll & Bute"
We can see the function works, but…
Base R functions
From the help:
match.arg matches a character arg against a table of candidate values
as specified by choices.
To put that more simply, to use the function, we need to pass an argument, and a vector of possible choices. The function will then check that argument against the choices to see if there is a match.
Let’s assume we only want to print Highland and Argyll and Bute
How can we use match.arg?
choose_council <- function(council = c("Highland",
"Argyll and Bute")){
council <- match.arg(council)
out <- paste("chosen council is", council)
return(out)
}
choose_council("Highland")
[1] "chosen council is Highland"
choose_council("Argyll and Bute")
[1] "chosen council is Argyll and Bute"
choose_council("Bob")
Error in match.arg(council): 'arg' should be one of "Highland", "Argyll and Bute"
choose_council(1)
Error in match.arg(council): 'arg' must be NULL or a character vector
choose_council("Argyll & Bute")
Error in match.arg(council): 'arg' should be one of "Highland", "Argyll and Bute"
if no value supplied, match.arg uses the first element
choose_council() # match.arg uses default arguments [1] "chosen council is Highland"
Partial matching is also possible - you can be lazy and only type the first few letters of your argument.
This is OK for this very simple example, but not for real-life code - certainly not any code where you care about the results.
(As an aside, if you regularly use T or F instead of TRUE and FALSE - you need to sort your life out)
This works, but .. careful now!

choose_council("A") # partial matching - can be risky
[1] "chosen council is Argyll and Bute"
stopifnot
We saw that our function didn’t work when we supplied a number.
choose_council(1)
In this case, match.arg has it’s own checks in the background. But we
can provide our own. We want to stop the function if a non character
argument is provided.
We use stopifnot to trigger immediately if a non character argument is
passed.
If a character argument is passed, we use the choices argument of
match.arg to validate that this is an acceptable value
choose_council <- function(council){
stopifnot(is.character(council))
council <- match.arg(council,
choices = c("Highland",
"Argyll and Bute"))
out <- paste("chosen council is", council)
return(out)
}
choose_council(1)
Error in choose_council(1): is.character(council) is not TRUE
choose_council("Argyll & Bute")
Error in match.arg(council, choices = c("Highland", "Argyll and Bute")): 'arg' should be one of "Highland", "Argyll and Bute"
Yikes.
We can add friendlier messages
choose_council <- function(council){
stopifnot("council must be character" = is.character(council))
council <- match.arg(council,
choices = c("Highland",
"Argyll and Bute"))
out <- paste("chosen council is", council)
return(out)
}
Partial matching works as before
choose_council("High")
[1] "chosen council is Highland"
But now we get a slightly more readable error message
choose_council(1) Error in choose_council(1): council must be character
chi_check()
See phsmethods
What is a CHI number? The Community Health Index number is used in Scotland to uniquely identify patients.
What needs to be checked?
- Does it contain no non-numeric characters?
- Is it ten digits in length?
- Do the first six digits denote a valid date?
- Is the checksum digit correct?
We can deal with the first three quite quickly with the {checkmate} package
checkmate
“Virtually every standard type of user error when passing arguments into function can be caught with a simple, readable line which produces an informative error message.
A substantial part of the package was written in C to minimize any worries about execution time overhead.”
example CHI
x <- "0101011237"
is this a character vector?
check_class(x, "character") checkClass(x, "character") [1] TRUE [1] TRUE
check_class and checkClass are exactly the same, simply choose
whether you prefer snake_case or camelCase
Functions beginning with check return either TRUE, (as above) or,
the error message
check_class(x, "integer") [1] "Must inherit from class 'integer', but has class 'character'"
Functions beginning with assert either return an error message, or the
checked object is returned invisibly:
assert_class(x, "integer") Error in eval(expr, envir, enclos): Assertion on 'x' failed: Must inherit from class 'integer', but has class 'character'. assert_class(x, "character")
Going back to the CHI example, we can use check_character for a more
fine grained series of checks
check_character(x, n.chars = 10, pattern = "\\d{10}") # 10 chars, numeric only
[1] TRUE
x2 <- "010101123A"
x3 <- c(x, x2, NA)
x4 <- c(x, NA)
check_character(x2, n.chars = 10, pattern = "[^A-Z]{10}")
check_character(x, n.chars = 10, pattern = "[^A-Z]{10}")
[1] "Must comply to pattern '[^A-Z]{10}'"
[1] TRUE
# final version
check_character(x,
min.len = 1,
n.chars = 10,
any.missing = FALSE,
pattern = "\\d{10}")
vals <- c(x, x2, x3, x4)
cat(vals)
purrr::map_chr(vals,
check_character,
min.len = 1,
n.chars = 10,
any.missing = FALSE,
pattern = "\\d{10}")
vals <- c(x, x2, x3, x4)
cat(vals)
0101011237 010101123A 0101011237 010101123A NA 0101011237 NA
purrr::map_chr(vals,
check_character,
min.len = 1,
n.chars = 10,
any.missing = FALSE,
pattern = "\\d{10}")
[1] "TRUE" "Must comply to pattern '\\d{10}'"
[3] "TRUE" "Must comply to pattern '\\d{10}'"
[5] "Contains missing values (element 1)" "TRUE"
[7] "Contains missing values (element 1)"
# are first 6 elements a Date?
date_val <- substr(x,1,6)
cat(date_val)
checkDate(as.Date(strptime(date_val,"%d%m%y", "UTC")),
lower = "1900-01-01",
upper = Sys.Date(),
any.missing = FALSE,
min.len = 1L)
010101
[1] TRUE
combine checks with the assert function
main_check <- function(x){
assert(check_character(x,
min.len = 1,
n.chars = 10,
any.missing = FALSE,
pattern = "\\d{10}"),
checkDate(as.Date(strptime(substr(x,1,6),"%d%m%y", "UTC")),
lower = "1900-01-01",
upper = Sys.Date(),
any.missing = FALSE,
min.len = 1L),
combine = "and")
}
out <- main_check(x)
out
[1] TRUE
for the lazy
qassertbuilt in data typesqassertrlists and data frames
qassert(x,"S+[10,11)") # character, vector length 1, lower bound 10 and less than 11 qassert(x,"S+[10,10]") # also works, between 10 and 10 (inclusive) # note difference in closing brackets # character denoted by `s` # no missing values denoted by UPPER CASE # exact length of string 10 denoted by [10]
testing
we can use {tinytest} for some checks also
tinytest::expect_inherits(x, "character") ----- PASSED : <--> call| tinytest::expect_inherits(x, "character")
Normally we’d list some expectations Here’s a useless function that adds 2 to a given numerical value
add_two <- function(x) {
if (is.character(x)) {
stop("You've passed a character vector.\nGonnae no' dae that? \nIt should be an integer or double")
}
checkmate::assert_count(x)
checkmate::assert_integerish(x)
!checkmate::anyMissing(x)
x <- x + 2
message("ya wee beauty!")
return(x)
}
using("checkmate")
# test add_two works
expect_equal(1 + 2, add_two(1))
ya wee beauty!
----- PASSED : <-->
call| expect_equal(1 + 2, add_two(1))
add_two("one")
Error in add_two("one"): You've passed a character vector.
Gonnae no' dae that?
It should be an integer or double
expect_error(add_two("one"))
----- PASSED : <-->
call| expect_error(add_two("one"))
See also
- defensive programming, covered in this excellent text by Gillespie and Lovelace
- purrr (use of
possibly/safely)
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.