How to become a better R code detective?
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Huge thanks to Hannah Frick for her useful feedback on this post! Vielen Dank!
When trying to fix a bug or add a feature to an R package, how do you go from viewing the code as a big messy ball of wool, to a logical diagram that you can bend to your will? In this post, I will share some resources and tips on getting better at debugging and reading code, written by someone else (or yourself but long enough ago to feel foreign!).
After step 1, there’s not really an order for applying the techniques, but you definitely want to acquire enough knowledge through research before you tinker, otherwise you will be tinkering quite randomly.
Apart from the idea of adding tests, most tips could apply to non-package code. If your motivation is to contribute to R itself please refer to A Guide to Contributing to R Core by Saranjeet Kaur and colleagues.
Step 0: Only deal with well-designed code
This is obviously not an actual tip, but more of a encouragement to try and write better and better code ourselves.
The Tidyverse style guide is a good read, both for learning tips, and for seeing what aspects such a style guide applies to. Of particular interest are e.g. the mention of function comments, and file organization.
Regarding code comments, I was intrigued and impressed by the notion of “explaining variables” mentioned in the tweet below referring a great short post.
Great post on why “explaining variables” and “explaining functions or methods” are better than code comments.
— Jenny Bryan (@JennyBryan) July 5, 2021
I found that my enthusiasm for writing/maintaining comments peaked many years ago and I'm better off using other techniques to make the non-obvious (more) obvious. https://t.co/yIIOnd8qHz
Beyond its being well designed, it’ll also help if the code is well tested: tests can help you understand the goals behind functions, and will help you re-factor without breaking features.
Now, we don’t always choose what code we get, and even if it was well-designed, we might still need detective skills anyway!
Step 1: Clone and build
In sharing this life-changing tip I am merely repeating the talk “Reading other people’s code” by Patricia Aas that I actually listened to as an episode of the All Things Git podcast. The techniques presented by Patricia Aas are not specific to R but many of them are relevant for R codebases.
Now to this tip… Instead of being overwhelmed by the idea of starting to tinker with a codebase, create a local version-controlled project with the codebase in it! E.g. fork a GitHub repo, and use usethis::create_from_github()
. Then open it, install the dependencies via remotes::install_deps(dependencies = TRUE)
, build or load it. Before amending things, create a new branch via e.g. gert::git_branch_create("tinkering")
. I suppose that if I were fancy I’d say this step, preparing a local codebase to play with, is your mise en place.
Obviously to reach that stage you’ll need to know what codebase is the one to be working on. However, you’ll probably start from some code in any case, e.g. your currently buggy code.
Make your problem smaller
In case of a bug, you’ll often be advised to make it a minimal reproducible example. While you’ll often hear this when you try to communicate your bug to someone else, it is also great practice to do this for yourself! An important thing to know here is reprex. A reprex is both a concept (reprex for reproducible example) and a package for communicating such examples, respectively promoted and maintained by Jenny Bryan. Why use reprex?
-
The isolated bug is easier to solve or will be solved by creating it! In Jenny Bryan’s talk “Object of type ‘closure’ is not subsettable” there’s an example of original code and its minified version.
-
As it is run in an isolated session you can be more sure that it’s reproducible.
-
You can send your bug in a format ideal for experts! But you might be writing a reprex just for yourself, to accompany some notes in a GitHub issue for instance.
How does reprex work?
-
You write some code in an editor (including loading libraries, creating toy data etc.).
-
You copy the code to your clipboard.
-
You run
reprex::reprex()
and reprex runs your code in an isolated session! -
You get the rendered code on the clipboard (and a preview in RStudio Viewer pane)! Error messages rendered, images uploaded to imgur.
-
You paste the rendered code somewhere, potentially to show to someone.
To learn more about reprex and adopt it, I’d recommend watching the RStudio webinar about reprex and reading reprex vignettes, in particular “Reprex do’s and don’ts”.
Also in the case of a bug, maybe you don’t need to read this post further if your problem is in the bingo below. Often, you’ll only notice “obvious” mistakes after making the problem smaller (or after taking a break!).
Debugging can be frustrating for our first time #RStats students… but maybe it can lead to a spot on Debugging Bingo! DeBug-o? DeBingo?
— Dr. Ji Son (@cogscimom) January 27, 2021
Here's a copy if you want to include the common errors you see in your class!https://t.co/mI3g2QfvR2 pic.twitter.com/kyOCO3jjzL
In case of amending the features of a package, it’ll be important to clearly defined the scope of what you’re after. Easier for your work as a code detective, but also for many other reasons, see Sarah Drasner’s post How to Scope Down PRs and Yihui Xie’s post “Bite-sized” Pull Requests.
Pull an end / Follow the trails
As you are not going to read code from cover to cover, you’ll need to find a logical way to explore the code.
I like the phrase follow the trails by Kara Woo in her excellent RStudio conference talk “Box plots – A case study in debugging and perseverance” as well as the phrase pull an end by Patricia Aas in her also excellent talk “Reading Other People’s Code” already mentioned in this post.
Find where to start
Easy case: there’s a message on screen telling you where an error occurred, or you know what function you want to amend.
Alternatively,
- You can put the error / warning in a search engine.
- If there’s an unclear error you can try to see the traceback i.e. what functions were called leading to that error. In her talk “Object of type ‘closure’ is not subsettable”, Jenny Bryan explains very well what a traceback is. In my
.Rprofile
I have
options( error = rlang::entrace, rlang_backtrace_on_error = "branch")
thanks to a tweet by Noam Ross reporting a tip by Jenny Bryan. “It gives trimmed tracebacks when using pipes.”
- If there’s no error but a warning you could try to convert the warning to an error.
Explore from that starting point
- You can use “grepping” as said by Patricia Aas: look for the occurrences of a function or variable name in a local folder, or via GitHub (advanced) search. You can limit the hits to some types of files e.g. R scripts in
R/
. - In your IDE e.g. RStudio there might be a way to go directly to the definition of function. With the lookup package you can also easily look up the source of functions locally and on GitHub.
How to read code: space… and time
- Hopefully the code makes sense on its own. If not, fear not, the next item right here, and the other sections of this post, as well as patience, will help.
- Sometimes using git blame or looking at the git history might help understand the context of some aspects of the code, if there’s no code comment referring an issue. Do not actually blame people, though. To make your own git history more informative for such later code archaeology, use branches and squash and merge.
squash-and-merge is awesome! It's like apartment cleaning but for your git commits https://t.co/X8OTGW808U
— Garrick Aden-Buie (@grrrck) March 19, 2019
Build your mental model of the code
That’s what Patricia Aas calls “mental machine”. You might want to draw some sort of diagram by hand (or programmatically). Patricia Aas remarks that such diagrams might even be contributed to the codebase as developer documentation.
Browse code by others
The life-hack below by Julia Silge for fixing Travis CI builds1, looking at other people’s configuration files, applies to any code endeavour.
LIFE HACK: My go-to strategy for getting Travis builds to work is snooping on *other* people's .travis.yml files. Shoutout today to the tidyr .travis.yml for solving my problem! #rstats ?
— Julia Silge (@juliasilge) December 12, 2019
How to find good examples?
- The lookup package can help you look up usage of a function on GitHub and in general GitHub Advanced search is really useful;
- You might look at the reverse dependencies of a package you are using;
- You might try to think of packages doing something similar to yours (e.g. another package munging XML data from an API ; another package wrapping a C library).
Beyond browsing files, browser()
Reading code and imagining what it does only goes so long. You can edit the code and see whether, from the outside, it does what you want it to. Sometimes you might also make do with print-debugging i.e. for instance writing print("coucou !")
to check whether a part of the code was run, or print(class(x))
to check an assumption about a thing. Sometimes print-debugging is the only technique you might be able to use if non-interactive debugging. It can also be perfect to know where a loop breaks which motivated the tweet below by Sharla Gelfand:
there's no debugging tool quite like it ??? pic.twitter.com/ZyHMupmkcn
— Sharla Gelfand (@sharlagelfand) April 13, 2021
But often you will have to go experiment under the hood. For doing that efficiently you will need to learn about browser()
and friends! Or only just browser()
to start with!
The basic idea is that you just replace the print()
command you were about to write with browser()
, run the code and voilà! You entered the debugger and can run code line by line, explore options and environment variables, etc. Over time it’ll become a habit of yours, at least that’s what happened to me once I saw the light. ?
Here are some good resources to learn about debugging tools. These resources also overlap with some of the objectives of this very blog post.
-
Debugging advice in the Hands-on programming with R book by Garrett Grolemund;
-
Debugging advice in the materials of the course “What they forgot to teach you about R” by Jenny Bryan and Jim Hester – it even includes tips for R Markdown debugging;
-
Jenny Bryan’s talk “Object of type ‘closure’ is not subsettable”.
Beyond R
Sometimes the bug or element to tweak will live outside of R. Maybe in some C code you are wrapping, maybe in a CSS file. You will therefore have to learn debugging tools for these things too!
Read tests? Write some for sure
In Patricia Aas’ techniques features the idea of writing and running tests to see what’s the code is supposed to do. They especially mention integration tests, whereas in R packages you’ll mostly find unit tests. Those can also be useful to read, especially when they start breaking after your experiments.
In any case, once you have amended a codebase to fix a bug or add a feature, add tests! In Kara Woo’s talk “Box plots – A case study in debugging and perseverance”, she explained she added tests. In Jenny Bryan’s talk “Object of type ‘closure’ is not subsettable” she uses the word “deter” in the part of the talk where she gives such advice: adding tests and assertions, but also other tips such as running those on continuous integration, “using mind bendy stuff in moderation”, leaving access panels (e.g. verbose modes), writing error messages for humans.
You could even write a failing test at the beginning of your code exploration, even leaving it failing for an easier restart when you come back to the codebase (better than a sticky note for sure!).
Rubberducking to a person
Another technique you will often see mentioned is rubberducking i.e. explaining your problem to a rubber duck. The simple act of phrasing your issue might help you solve it.
However, you might prefer to speak to an actual person, or pretend you are as written in the tweet by Julia Evans below:
yea i don't really like to talk about rubber duck debugging because personally I find it hard to get into the right mindset if I don't have a specific person in mind who I'm talking to, even if I'm just writing them an email / Slack message which I later delete without sending it
— ?Julia Evans? (@b0rk) June 28, 2021
I liked seeing that as I sometimes open a Slack conversation with someone as if I were about to ask for their help, and whilst preparing my notes for them, I’ll solve my issue.
Refactoring
Another tip by Patricia Aas is refactoring the code as it might improve your understanding of it. They underline that you should not contribute the results of your refactoring, especially as a first PR, as people might hate you! It’s an exercise for you.
That said, I remember receiving nice PRs from someone who had just read the Clean Code book by Robert C. Martin . They started with a small one, and were very polite. Since then I’ve seen some bad critic of the book but these PRs made perfect sense. I can however easily imagine a big refactoring PR would not be happily received!
So, in a nutshell, as said by Patricia Aas, refactor to learn, in your own local copy or your own fork.
Asking for help
As nice as solving a problem on one’s own is, asking for help might be the solution! It is also a skill, or more, a bunch of skills: both how and where to ask for help but also deciding when to ask for help, when it’s no longer worth anyone’s money to have you continuing to work alone on a problem.
How to ask for help?
In the section of this post about making your problem smaller, we mentioned creating a reprex and having a clear scope. These are elements that will be featured in your plea for help.
It can also be good if you have tracked your progress, the various things you have tried.
Where to ask for help?
Where to ask for help depends on your question and the codebase you are working on. If you are working on a pull request in a package where you already have a good working relationship with other developers, or where you were encouraged to open a PR (directly or via the contributing guide), you might get help from other contributors.
I wrote a blog post on where to get help with your R questions.
See also my post on the R-hub blog, How to get help with R package development? R-package-devel and beyond, for general venues where to ask for help about package development. My favorite ones are the rOpenSci forum and the package development category of the RStudio community forum.
Reading other people’s debugging journeys, document yours
Sadly but understandably people will often only take the time to document their debugging journey when the bug is especially tricky or weird. However, few people write actual debugging games.
In the meantime, you might enjoy watching or hearing some debugging journeys. You will notice how these programmers make and invalidate hypotheses.
-
Kara Woo’s talk “Box plots – A case study in debugging and perseverance”;
-
“Debugging and Fixing CRAN’s ‘Additional Checks’ errors” by Rich FitzJohn;
-
“Debugging memory errors with valgrind and gdb” by Rich FitzJohn.
If you end up documenting your own code detective story, please tell me, I’d like to read it!
Conclusion
In this post I presented various techniques useful for code detectives. Getting better at them will help you debug and amend codebases. Now, I was able to summarize these tips, but I can’t say I never get stuck. ?
Where next?
If you have no personal opinion on what to “study” next from the numerous links of this post or elsewhere, my recommendation would be:
- Start with Kara Woo’s talk “Box plots – A case study in debugging and perseverance” as it’s both a real and engaging story (with closure i.e. the bug was fixed! happy end!) and covers many useful debugging tools.
- Continue with the Debugging advice in the Advanced R book by Hadley Wickham that is very clear and well organized. “Advanced R” might a frightening title but really don’t be afraid.
- If you are an RStudio IDE user, you’ll find great use of Amanda Gadrow’s webinar about debugging techniques in RStudio and the corresponding official RStudio documentation.
- Then you could watch Jenny Bryan’s talk “Object of type ‘closure’ is not subsettable” as it covers a lot of ground around problem solving in R, with ideas extremely well conveyed (and if you want to binge watch more talks of Jenny Bryan’s, continue with “Code feels, code smells” that’ll help you write better code to start with ?).
- Lastly, you might be interested in drawing your own lessons from non R specific resources such as the All Things Git podcast episode with Patricia Aas, Julia Evans’ blog posts about debugging and Julia Evans’ comics about debugging.
And then, just wait for the next problem to tackle in your coding practice… One never has to wait very long. ?
Last words
Last but not least I want to emphasize that there are also human aspects to this process.
we think about debugging as a technical skill (and it absolutely is!!) but a huge amount of it is managing your feelings so you don't get discouraged and being self-aware so you can recognize your incorrect assumptions
— ?Julia Evans? (@b0rk) June 11, 2021
I love waking up and jumping back into solving a bug and immediately solving it with a fresh mind. Sleep is my favorite coding tool.
— Kelly Vaughn ? (@kvlly) April 23, 2021
What are your favourite tips and resources? Are you too eagerly awaiting Julia Evans’ zine about debugging? Please tell me in the comments below!
-
Travis CI itself is no longer recommended by rOpenSci for instance. ↩︎
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.