# Handling errors using purrr’s possibly() and safely()

**Very statisticious on Very statisticious**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

One topic I haven’t discussed in my previous posts about automating tasks with loops or doing simulations is how to deal with errors. If we have unanticipated errors a `map()`

or `lapply()`

loop will come to a screeching halt with no output to show for the time spent. When your task is time-consuming, this can feel pretty frustrating, since the whole process has to be restarted.

How to deal with errors? Using functions `try()`

or `tryCatch()`

when building a function is the traditional way to catch and address potential errors. In the past I’ve struggled to remember how to use these, though, and functions `possibly()`

and `safely()`

from package **purrr** are convenient alternatives that I find a little easier to use.

In this post I’ll show examples on how to use these two functions for handling errors. I’ll also demonstrate the use of the related function `quietly()`

to capture other types of output, such as warnings and messages.

## Table of Contents

# R packages

The functions I’m highlighting today are from package **purrr**. I’ll also use **lme4** for fitting simulated data to linear mixed models.

library(purrr) # v. 0.3.4 library(lme4) # v. 1.1-23

# Using possibly() to return values instead of errors

When doing a repetitive task like fitting many models with a `map()`

loop, an error in one of the models will shut down the whole process. We can anticipate this issue and bypass it by defining a value to return if a model errors out via `possibly()`

.

I created the very small dataset below to demonstrate the issue. The goal is to fit a linear model of `y`

vs `x`

for each `group`

. I made exactly two groups here, *a* and *b*, to make it easy to see what goes wrong and why. Usually we have many more groups and potential problems can be harder to spot.

dat = structure(list(group = c("a", "a", "a", "a", "a", "a", "b", "b", "b"), x = c("A", "A", "A", "B", "B", "B", "A", "A", "A"), y = c(10.9, 11.1, 10.5, 9.7, 10.5, 10.9, 13, 9.9, 10.3)), class = "data.frame", row.names = c(NA, -9L)) dat # group x y # 1 a A 10.9 # 2 a A 11.1 # 3 a A 10.5 # 4 a B 9.7 # 5 a B 10.5 # 6 a B 10.9 # 7 b A 13.0 # 8 b A 9.9 # 9 b A 10.3

I’ll first split the dataset by `group`

to get a list of data.frames to loop through.

dat_split = split(dat, dat$group)

Then I’ll loop through each dataset in the list with `map()`

and fit a linear model with `lm()`

. Instead of getting output, though, I get an error.

map(dat_split, ~lm(y ~ x, data = .x) ) Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]): contrasts can be applied only to factors with 2 or more levels

What’s going on? If you look at the dataset again, you’ll see that `x`

from group *b* contains only a single value. Once you know that you can see the error actually is telling us what the problem is: we can’t use a factor with only one level.

Model *a* fits fine, since `x`

has two values.

lm(y ~ x, data = dat, subset = group == "a") # # Call: # lm(formula = y ~ x, data = dat, subset = group == "a") # # Coefficients: # (Intercept) xB # 10.8333 -0.4667

It is the *b* model that fails.

lm(y ~ x, data = dat, subset = group == "b") Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]): contrasts can be applied only to factors with 2 or more levels

You can imagine that the problem of having only a single value for the factor in some groups could be easy to miss if you working with a large number groups. This is where `possibly()`

can help, allowing us to keep going through all groups regardless of errors. We can then find and explore problem groups.

## Wrapping a function with possibly()

The `possibly()`

function is a *wrapper* function. It wraps around an existing function. Other than defining the function to wrap, the main argument of interest is `otherwise`

. In `otherwise`

we define what value to return if we get an error from the function we are wrapping.

I make a new wrapped function called `posslm1()`

, which wraps `lm()`

and returns “Error” if an error occurs when fitting the model.

posslm1 = possibly(.f = lm, otherwise = "Error")

When I use `posslm1()`

in my model fitting loop, you can see that loop now finishes. Model *b* contains the string “Error” instead of a model.

map(dat_split, ~posslm1(y ~ x, data = .x) ) # $a # # Call: # .f(formula = ..1, data = ..2) # # Coefficients: # (Intercept) xB # 10.8333 -0.4667 # # # $b # [1] "Error"

Here’s another example of `possibly()`

wrapped around `lm()`

, this time using `otherwise = NULL`

. Depending on what we plan to do with the output, using `NULL`

or `NA`

as the return value can be useful when using `possibly()`

.

Now group *b* is `NULL`

in the output.

posslm2 = possibly(.f = lm, otherwise = NULL) ( mods = map(dat_split, ~posslm2(y ~ x, data = .x) ) ) # $a # # Call: # .f(formula = ..1, data = ..2) # # Coefficients: # (Intercept) xB # 10.8333 -0.4667 # # # $b # NULL

## Finding the groups with errors

Once the loop is done, we can examine the groups that had errors when fitting models. For example, I can use `purrr::keep()`

to keep only the results that are `NULL`

.

mods %>% keep(~is.null(.x) ) # $b # NULL

This allows me to pull out the names for the groups that had errors. Getting the names in this way is one reason I like that `split()`

returns named lists.

group_errs = mods %>% keep(~is.null(.x) ) %>% names() group_errs # [1] "b"

Once I have the names of the groups with errors, I can pull any problematic groups out of the original dataset or the split list to examine them more closely. (I use `%in%`

here in case `group_errs`

is a vector.)

dat[dat$group %in% group_errs, ] # group x y # 7 b A 13.0 # 8 b A 9.9 # 9 b A 10.3 dat_split[group_errs] # $b # group x y # 7 b A 13.0 # 8 b A 9.9 # 9 b A 10.3

## Using compact() to remove empty elements

You may come to a point where you’ve looked at the problem groups and decide that the models with errors shouldn’t be used in further analysis. In that case, if all the groups with errors are `NULL`

, you can use `purrr::compact()`

to remove the empty elements from the list. This can make subsequent loops to get output more straightforward in some cases.

compact(mods) # $a # # Call: # .f(formula = ..1, data = ..2) # # Coefficients: # (Intercept) xB # 10.8333 -0.4667

# Using safely() to capture results and errors

Rather than replacing the errors with values, `safely()`

returns both the results and the errors in a list. This function is also a wrapper function. It defaults to using `otherwise = NULL`

, and I generally haven’t had reason to change away from that default.

Here’s an example, wrapping `lm()`

in `safely()`

and then using the wrapped function `safelm()`

to fit the models.

safelm = safely(.f = lm) mods2 = map(dat_split, ~safelm(y ~ x, data = .x) )

The output for each group is now a list with two elements, one for results (if there was no error) and the other for the error (if there was an error).

Here’s what this looks like for model *a*, which doesn’t have an error. The output contains a result but no error.

mods2[[1]] # $result # # Call: # .f(formula = ..1, data = ..2) # # Coefficients: # (Intercept) xB # 10.8333 -0.4667 # # # $error # NULL

Model *b* didn’t work, of course, so the results are `NULL`

but the error was captured in `error`

.

mods2[[2]] # $result # NULL # # $error #

## Exploring the errors

One reason to save the errors using `safely()`

is so we can take a look at what the errors were for each group. This is most useful with informative errors like the one in my example.

Errors can be extracted with a `map()`

loop, pulling out the “error” element from each group.

map(mods2, "error") # $a # NULL # # $b #

## Extracting results

Results can be extracted similarly, and, if relevant, `NULL`

results can be removed via `compact()`

.

mods2 %>% map("result") %>% compact() # $a # # Call: # .f(formula = ..1, data = ..2) # # Coefficients: # (Intercept) xB # 10.8333 -0.4667

# Using quietly() to capture messages

The `quietly()`

function doesn’t handle errors, but instead captures other types of output such as warnings and messages along with any results. This is useful for exploring what kinds of warnings come up when doing simulations, for example.

A few years ago I wrote a post showing a simulation for a linear mixed model. I use the following function, pulled from that earlier post.

twolevel_fun = function(nstand = 5, nplot = 4, mu = 10, sigma_s = 1, sigma = 1) { standeff = rep( rnorm(nstand, 0, sigma_s), each = nplot) stand = rep(LETTERS[1:nstand], each = nplot) ploteff = rnorm(nstand*nplot, 0, sigma) resp = mu + standeff + ploteff dat = data.frame(stand, resp) lmer(resp ~ 1 + (1|stand), data = dat) }

One thing I skipped discussing in that post were the messages returned for some simulations. However, I can certainly picture scenarios where it would be interesting and important to capture warnings and messages to see, e.g., how often they occur even when we know the data comes from the model.

Here I’ll set the seed so the results are reproducible and then run the function 10 times. You see I get two messages, indicating that two of the ten models returned a message. In this case, the message indicates that the random effect variance is estimated to be exactly 0 in the model.

set.seed(16) sims = replicate(10, twolevel_fun(), simplify = FALSE ) # boundary (singular) fit: see ?isSingular # boundary (singular) fit: see ?isSingular

It turns out that the second model in the output list is one with a message. You can see at the bottom of the model output below that there is 1 lme4 warning.

sims[[2]] # Linear mixed model fit by REML ['lmerMod'] # Formula: resp ~ 1 + (1 | stand) # Data: dat # REML criterion at convergence: 45.8277 # Random effects: # Groups Name Std.Dev. # stand (Intercept) 0.0000 # Residual 0.7469 # Number of obs: 20, groups: stand, 5 # Fixed Effects: # (Intercept) # 10.92 # convergence code 0; 0 optimizer warnings; 1 lme4 warnings

The **lme4** package stores warnings and messages in the model object, so I can pull the message out of the model object.

sims[[2]]@optinfo$conv$lme4 # $messages # [1] "boundary (singular) fit: see ?isSingular"

But I think `quietly()`

is more convenient for this task. This is another wrapper function, and I’m going to wrap it around `lmer()`

. I do this because I’m focusing specifically on messages that happen when I fit the model. However, I could have wrapped `twolevel_fun()`

and captured any messages from the entire simulation process.

I use my new function `qlmer()`

inside my simulation function.

qlmer = quietly(.f = lmer) qtwolevel_fun = function(nstand = 5, nplot = 4, mu = 10, sigma_s = 1, sigma = 1) { standeff = rep( rnorm(nstand, 0, sigma_s), each = nplot) stand = rep(LETTERS[1:nstand], each = nplot) ploteff = rnorm(nstand*nplot, 0, sigma) resp = mu + standeff + ploteff dat = data.frame(stand, resp) qlmer(resp ~ 1 + (1|stand), data = dat) }

I set the seed back to 16 so I get the same models and then run the function using `qlmer()`

10 times. Note this is considered *quiet* because the messages are now captured in the output by `quietly()`

instead of printed.

The wrapped function returns a list with 4 elements, including the results, any printed output, warnings, and messages. You can see this for the second model here.

set.seed(16) sims2 = replicate(10, qtwolevel_fun(), simplify = FALSE) sims2[[2]] # $result # Linear mixed model fit by REML ['lmerMod'] # Formula: resp ~ 1 + (1 | stand) # Data: ..2 # REML criterion at convergence: 45.8277 # Random effects: # Groups Name Std.Dev. # stand (Intercept) 0.0000 # Residual 0.7469 # Number of obs: 20, groups: stand, 5 # Fixed Effects: # (Intercept) # 10.92 # convergence code 0; 0 optimizer warnings; 1 lme4 warnings # # $output # [1] "" # # $warnings # character(0) # # $messages # [1] "boundary (singular) fit: see ?isSingular\n"

In a simulation setting, I think seeing how many times different messages and warnings come up could be pretty interesting. It might inform how problematic a message is. If a message is common in simulation we may feel more confident that such a message from a model fit to our real data is not a big issue.

For example, I could pull out all the `messages`

and then put the results into a vector with `unlist()`

.

sims2 %>% map("messages") %>% unlist() # [1] "boundary (singular) fit: see ?isSingular\n" # [2] "boundary (singular) fit: see ?isSingular\n"

If I wanted to extract multiple parts of the output, such as keeping both messages and warnings, I can use the extract brackets in `map()`

.

These results don’t look much different compared to the output above since there are no warnings in my example. However, note the result is now in a named vector so I could potentially keep track of which are `messages`

and which are `warnings`

if I needed to.

sims2 %>% map(`[`, c("messages", "warnings") ) %>% unlist() # messages # "boundary (singular) fit: see ?isSingular\n" # messages # "boundary (singular) fit: see ?isSingular\n"

I showed only fairly simple way to use these three functions. However, you certainly may find yourself using them for more complex tasks. For example, I’ve been in situations in the past where I wanted to keep only models that didn’t have errors when building parametric bootstrap confidence intervals. If they had existed at the time, I could have used `possibly()`

or `safely()`

in a `while()`

loop, where the bootstrap data would be redrawn until a model fit without error. Very useful! 😉

# Just the code, please

Here’s the code without all the discussion. Copy and paste the code below or you can download an R script of uncommented code from here.

library(purrr) # v. 0.3.4 library(lme4) # v. 1.1-23 dat = structure(list(group = c("a", "a", "a", "a", "a", "a", "b", "b", "b"), x = c("A", "A", "A", "B", "B", "B", "A", "A", "A"), y = c(10.9, 11.1, 10.5, 9.7, 10.5, 10.9, 13, 9.9, 10.3)), class = "data.frame", row.names = c(NA, -9L)) dat dat_split = split(dat, dat$group) map(dat_split, ~lm(y ~ x, data = .x) ) lm(y ~ x, data = dat, subset = group == "a") lm(y ~ x, data = dat, subset = group == "b") posslm1 = possibly(.f = lm, otherwise = "Error") map(dat_split, ~posslm1(y ~ x, data = .x) ) posslm2 = possibly(.f = lm, otherwise = NULL) ( mods = map(dat_split, ~posslm2(y ~ x, data = .x) ) ) mods %>% keep(~is.null(.x) ) group_errs = mods %>% keep(~is.null(.x) ) %>% names() group_errs dat[dat$group %in% group_errs, ] dat_split[group_errs] compact(mods) safelm = safely(.f = lm) mods2 = map(dat_split, ~safelm(y ~ x, data = .x) ) mods2[[1]] mods2[[2]] map(mods2, "error") mods2 %>% map("result") %>% compact() twolevel_fun = function(nstand = 5, nplot = 4, mu = 10, sigma_s = 1, sigma = 1) { standeff = rep( rnorm(nstand, 0, sigma_s), each = nplot) stand = rep(LETTERS[1:nstand], each = nplot) ploteff = rnorm(nstand*nplot, 0, sigma) resp = mu + standeff + ploteff dat = data.frame(stand, resp) lmer(resp ~ 1 + (1|stand), data = dat) } set.seed(16) sims = replicate(10, twolevel_fun(), simplify = FALSE ) sims[[2]] sims[[2]]@optinfo$conv$lme4 qlmer = quietly(.f = lmer) qtwolevel_fun = function(nstand = 5, nplot = 4, mu = 10, sigma_s = 1, sigma = 1) { standeff = rep( rnorm(nstand, 0, sigma_s), each = nplot) stand = rep(LETTERS[1:nstand], each = nplot) ploteff = rnorm(nstand*nplot, 0, sigma) resp = mu + standeff + ploteff dat = data.frame(stand, resp) qlmer(resp ~ 1 + (1|stand), data = dat) } set.seed(16) sims2 = replicate(10, qtwolevel_fun(), simplify = FALSE) sims2[[2]] sims2 %>% map("messages") %>% unlist() sims2 %>% map(`[`, c("messages", "warnings") ) %>% unlist()

**leave a comment**for the author, please follow the link and comment on their blog:

**Very statisticious on Very statisticious**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.