By number of questions asked, R passed MATLAB for the first time on Stack Overflow today. Thus it seems an appropriate time to write my first R-based post.
This post concerns what to do when your R-code goes pear shaped. Back in June there were a couple of very good videos on R debugging that came out of an R meetup in New York. Jay Emerson talked about basic debugging functions like
browser and Harlan Harris talked about more advanced techniques like
debug. These R meetups sound like a great idea but I suspect that we don’t have the critical mass of R users here in Buxton, UK. I digress …
There are two obvious cases where you need to debug things. If you are lucky, you have the case of an error being thrown. It sounds like it should be the worst case, but at least you get to know where the problem is occurring; the more difficult situation is simply getting the wrong answer.
When I get an error, then
traceback() is my usual instinctive response to find the location of the error. If the
ggplot2 package is loaded, then
tr() provides a way to save a few keystrokes. This function isn’t infallible though, and seems to have particular trouble with code in
try blocks. To see this, compare
throw_error <- function() stop("!!!")
throw_error_in_try <- function() try(stop("!!!"))
traceback() #Same as before; new error did not register
In the ‘hard’ case, where you have a wrong answer rather than an error, or where
traceback has let you down, you’ll have to step through your code to hunt down the problem. This entails using the
debug function, or it’s graphical equivalent
mtrace (from the
debug package). I don’t want to spend time on those functions here (another post perhaps), so if you’re desperate to know how they work I recommend Harlan’s video tutorial.
After I’ve found the function that is causing the problem, my next step is usually to stick a call to
browser in my code and rerun it. This lets me explore my environment and test out alternative code. The following example is just a wrapper for
sum. The nesting gives us a call stack to look at.
first <- function(...) second(...)
second <- function(...) sum(...)
first(2, 4, "6")
The call to traceback tells us where the error is.
1: first(2, 4, "6")
Let’s suppose that the error wasn’t as obvious as this. (That’s the character input, for those of you asleep at the back.) The
traceback output has shown us that the error occurred in the function
second. Technically, it occurred in
sum, but the contents of that are C code rather than R code and
traceback is smart enough to know it is of no use to us. As I described above, the next step is to call
browser, just before the problem occurs.
second <- function(...)
Now when we rerun the code, execution halts at the
browser() line, and we get the chance to dig about into what is going on. In this case, since all the arguments are contained in the ellipsis, we can see everything with
list(...). The more common circumstance is to have at least some named arguments in your function call. In those cases
ls.str() is the quickest way to see what is going on.
The real strength of the
browser function is that if the problem wasn’t obvious now, we could go on to execute additional code from within the function environment. Nevertheless, there are some limitations with the approach. The first issue is that we need to be able to get at the source code. There are two main cases where we can’t do this: firstly, when external code is called and secondly, when the code is tucked away in a package. The external case is sadly quite insoluble. I don’t know of any easy ways to debug C or Fortran code from R. For debugging in packages, if you have an error being thrown, setting
options(error = recover) is an excellent alternative to adding
browser statements. If there is no error thrown, then
trace are the best solutions, but they are beyond the scope of this post.
Another issue is that the problem with our code may not be caused in the same function that the error is thrown in. It could occur higher up the call stack. In that sort of situation, you need a way to see what is happening in all the functions that you’ve called. And that is what I’m going to talk about in part two of this post.