Oh (de)bugger!
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
By number of questions asked, R passed MATLAB for the first time on Stack Overflow today. Thus it seems an appropriate time to write my first R-based post.
This post concerns what to  do when your R-code goes pear shaped.  Back in June there were a couple of very good videos on R debugging that came out of an R meetup in New York.  Jay Emerson talked about basic debugging functions like print and browser and Harlan Harris talked about more advanced techniques like trace and debug.  These R meetups sound like a great idea but I suspect that we don’t have the critical mass of R users here in Buxton, UK.  I digress …
There are two obvious cases where you need to debug things. If you are lucky, you have the case of an error being thrown. It sounds like it should be the worst case, but at least you get to know where the problem is occurring; the more difficult situation is simply getting the wrong answer.
When I get an error, then traceback() is my usual instinctive response to find the location of the error.  If the ggplot2 package is loaded, then tr() provides a way to save a few keystrokes.  This function isn’t infallible though, and seems to have particular trouble with code in try blocks.  To see this, compare
throw_error <- function() stop("!!!")
throw_error()
traceback()
with
throw_error_in_try <- function() try(stop("!!!"))
throw_error_in_try()
traceback() #Same as before; new error did not register
In the ‘hard’ case, where you have a wrong answer rather than an error, or where traceback has let you down, you’ll have to step through your code to hunt down the problem.  This entails using the debug function, or it’s graphical equivalent mtrace (from the debug package).  I don’t want to spend time on those functions here (another post perhaps), so if you’re desperate to know how they work I recommend Harlan’s video tutorial.
After I’ve found the function that is causing the problem, my next step is usually to stick a call to browser in my code and rerun it.  This lets me explore my environment and test out alternative code.  The following example is just a wrapper for sum.  The nesting gives us a call stack to look at.
first <- function(...) second(...)
second <- function(...) sum(...)
first(2, 4, "6")
The call to traceback tells us where the error is.
2: second(...)
1: first(2, 4, "6")
Let’s suppose that the error wasn’t as obvious as this.  (That’s the character input, for those of you asleep at the back.)  The traceback output has shown us that the error occurred in the function second.   Technically, it occurred in sum, but the contents of that are C code rather than R code and traceback is smart enough to know it is of no use to us.  As I described above, the next step is to call browser, just before the problem occurs.
second <- function(...)
{
   browser()
   sum(...)
}
Now when we rerun the code, execution halts at the browser() line, and we get the chance to dig about into what is going on. In this case, since all the arguments are contained in the ellipsis, we can see everything with list(...).  The more common circumstance is to have at least some named arguments in your function call.  In those cases ls.str() is the quickest way to see what is going on.  list(...) returns
[[1]]
[1] 2
[[2]]
[1] 4
[[3]]
[1] “6″
The real strength of the browser function is that if the problem wasn’t obvious now, we could go on to execute additional code from within the function environment.  Nevertheless, there are some limitations with the approach.  The first issue is that we need to be able to get at the source code.  There are two main cases where we can’t do this: firstly, when external code is called and secondly, when the code is tucked away in a package.  The external case is sadly quite insoluble.  I don’t know of any easy ways to debug C or Fortran code from R.  For debugging in packages, if you have an error being thrown, setting options(error = recover) is an excellent alternative to adding browser statements.  If there is no error thrown, then debug and trace are the best solutions, but they are beyond the scope of this post.
Another issue is that the problem with our code may not be caused in the same function that the error is thrown in. It could occur higher up the call stack. In that sort of situation, you need a way to see what is happening in all the functions that you’ve called. And that is what I’m going to talk about in part two of this post.
Tagged: debugging, r
 
		
            
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
