Writing a Personal R Package

[This article was first published on The Jumping Rivers Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

If you’ve been using R for a while, you’ve likely accumulated a hodgepodge of useful code along the way. Said hodgepodge might include functions you source into multiple projects; bits and bobs that you copy and paste where needed; or code that solved a particularly esoteric problem and will never be applicable elsewhere, but you still enjoy revisiting sometimes. We all do it.

If you’re anything like me, your personal library of code has grown gradually and haphazardly. It’s spread across multiple locations, wherever was easiest at the time: a directory on a laptop; a pen drive; line 243 of the script in which it was originally conceived. It’s entirely untested, poorly documented and takes a while to decipher every time you return to it. You’re proud of some of it, and ashamed at the mess of the rest of it. If you’re nodding along so far, this blogpost is for you.

A better way

Writing an R package can seem daunting at first. If you’ve browsed the source code of a popular CRAN package then you can be forgiven for feeling overwhelmed. But a package doesn’t need to be the next data.table or have the audience of dplyr to be worthwhile. And thanks to the wonderful usethis package, creating a personal R package could scarcely be easier.

Plenty of prominent statisticians and data scientists have personal R packages in which they store their miscellaneous functions: Karl Broman; Yihui Xie; and David Robinson, to name a few. But nobodies can do the same: I now have one. The primary audience for a personal R package is yourself, and it doesn’t matter if no else uses or cares about it.

Why bother?

I first created my personal R package because I had the following function saved in a directory called “useful-code”:

## See discussion below for an improved version
days_of_week = function(abbr = FALSE) {
  days = weekdays(as.Date(seq(7), origin = "1950-01-01"))
  if (isFALSE(abbr)) {
    return(days)
  } else {
    return(substr(days, 1, 3))
  }
}

At the time I was working on a project that required aggregating data by day of the week. I was tired of typing out “Monday, Tuesday, Wednesday, …” by hand every time, so I “borrowed” the best code I could find on the topic from Stack Overflow and turned it into a function that did nothing other than return a vector of the days of the week in the formats I required:

days_of_week()
#> [1] "Monday"    "Tuesday"   "Wednesday" "Thursday"  "Friday"    "Saturday"  "Sunday"
days_of_week(abbr = TRUE)
#> [1] "Mon" "Tue" "Wed" "Thu" "Fri" "Sat" "Sun"

Every time I wanted to use days_of_week() in a new project (it was more often than you might think), I copied it from my “useful-code” directory, pasted it into a “functions” sub-directory within my project, and used source() to load it into the relevant script(s). It was laborious and unsustainable; I periodically made (and make) changes to the function but, inevitably, I could never remember all the places I’d used it. I ended up using different versions of the same function across multiple projects, which is risky territory to be in even for a function as basic as this one. After reading Hilary Parker’s blogpost on personal R packages, I decided to give it a try.

How to do it

The aforementioned usethis package takes care of the setup grunt work:

path = file.path(tempdir(), "jafun")
usethis::create_package(path)
#> ✓ Creating '/data/ncsg3/R_tmp/Rtmp8c0BJt/jafun/'
#> ✓ Setting active project to '/data/ncsg3/R_tmp/Rtmp8c0BJt/jafun'
#> ✓ Creating 'R/'
#> ✓ Writing 'DESCRIPTION'
#> Package: jafun
#> Title: What the Package Does (One Line, Title Case)
#> Version: 0.0.0.9000
#> Authors@R (parsed):
#>     * First Last <[email protected]> [aut, cre] (YOUR-ORCID-ID)
#> Description: What the package does (one paragraph).
#> License: `use_mit_license()`, `use_gpl3_license()` or friends to pick a
#>     license
#> Encoding: UTF-8
#> LazyData: true
#> Roxygen: list(markdown = TRUE)
#> RoxygenNote: 7.1.1
#> ✓ Writing 'NAMESPACE'
#> ✓ Setting active project to '<no active project>'

Move your function(s) into the “R” directory, add some documentation, build the package with Ctrl + Shift + B, and voila:

jafun::days_of_week()
jafun::days_of_week(abbr = TRUE)

Next steps

Once you’ve developed the basis of your personal R package, you can spruce it up with as many or as few additional features as you’d like to. Adding your name and some basic details about the package to the Description file is helpful. Unit testing, releases and automated checking are present in most packages. Putting your package on GitHub allows you (and others) to install your package remotely using remotes::install_github() without requiring a local copy of the source code. None of that is compulsory though. Your package only has to be as detailed as you want it to be.

Your package will, inevitably, evolve over time. You’ll add new functions and improve upon existing ones. By way of example, a more efficient implementation of the aforementioned days_of_week() would be:

days_of_week = function(abbreviate = FALSE) {
  dates = as.Date(1:7, origin = "1950-01-01")
  weekdays(dates, abbreviate = abbreviate)
}

This would negate the need for an if statement, and it would fix a bug in the existing version. There are undoubtedly plenty of implementations more efficient than my original one. But I wrote days_of_week() for myself, with the skills I had at the time, and it did what I needed it to do. Any package maintainer would stress the importance of refining your code over time, but it doesn’t need to be at its optimum from the outset to be worth going in a package.

The longer you use R, the more miscellaneous code you’ll amass. And the more you amass, the harder it’ll be to keep track of. Creating a personal R package provides a sustainable and pain-free method of storing, growing and re-using your unique library of code. It might even provide a safe incubator to learn the ropes of package development prior to making open source contributions elsewhere. But at the very least, it’ll stop you from dipping into that “useful-functions” directory every time you want a vector of the days of the week.

Notes and thanks

Most of the links in this blogpost are from the R Packages book by Hadley Wickham and Jenny Bryan.


For updates and revisions to this article, see the original post

To leave a comment for the author, please follow the link and comment on their blog: The Jumping Rivers Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)