The need for documenting functions

[This article was first published on Stat Bandit » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

My current work usually requires me to work on a project until we can submit a research paper, and then move on to a new project. However, 3-6 months down the road, when the reviews for the paper return, it is quite common to have to do some new analyses or re-analyses of the data. At that time, I have to re-visit my code!

One of the common problems I (and I’m sure many of us) have is that we tend to hack code and functions with the end in mind, just getting the job done. However, when we have to re-visit code when it’s no longer fresh in our memory, it takes significant time to decipher what some code snippet or function is doing, or why we did it in the first place. Then, when our paper gets published and a reader wants our code to try, it’s a bear getting it into any kind of shareable form. Recently I’ve had both issues, and decided, enough was enough!!

R has a fantastic package roxygen2 that makes documenting functions quite easy. The documentation sits just above the function code, so it is there front and center. Taking 2-5 minutes to write even a bare-bones documentation, that includes

  • what the function does
  • what are the inputs (in English) and their required R class
  • what is the output and its R class
  • maybe one example

makes the grief of re-discovering the function and trying to decipher it go away. What does this look like? Here’s a recent example from my files:

#' Find column name corresponding to a particular functional
#'
#' The original data set contains very long column headers. This function
#' does a keyword search over the headers to find those column headers that
#' match a particular keyword, e.g., mean, median, etc.
#' @param x The data we are querying (data.frame)
#' @param v The keyword we are searching for (character)
#' @param ignorecase Should case be ignored (logical)
#' @return A vector of column names matching the keyword
#' @export
findvar <- function(x,v, ignorecase=TRUE) {
  if(!is.character(v)) stop('name must be character')
  if(!is.data.frame(x)) stop('x must be a data.frame')
  v <- grep(v,names(x),value=T, ignore.case=ignorecase)
  if(length(v)==0) v <- NA
  return(v)
}

My code above might not meet best practices, but it achieves two things for me. It reminds me of why I wrote this function, and tells me what I need to run it. This particular snippet is not part of any R package (though I could, with my new directory structure for projects, easily create a project-specific package if I need to). Of course this type of documentation is required if you are indeed writing packages.  

Resources

There are several resources for learning about roxygen2. First and foremost is the chapter Documenting functions from Hadley Wickham’s online book. roxygen2 also has its own tag on StackOverflow.

On the software side, RStudio supports roxygen2; see here. Emacs/ESS also has extensive roxygen2 support. The Rtools package for Sublime Text provides a template for roxygen2 documentation. So getting started in the editor of your choice is not a problem.


To leave a comment for the author, please follow the link and comment on their blog: Stat Bandit » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)