The DRY Principle and Knowing When to Make a Package
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Don’t Repeat Yourself (DRY)
Probably everyone who has done some kind of programming has heard of the
“Don’t Repeat
Yourself” (DRY)
principle. In a nutshell, it’s about reducing code redundancy for the
purpose of reducing error and enhancing readability.
Undoubtedly the most common manifestation of the DRY principle is the
creation of a function for re-used logic. The “rule of 3” is a good
shorthand for identifying when you might want to rethink how your code
is organized– “You should consider writing a function whenever you’ve
copied and pasted a block of code more than twice (i.e. you now have
three copies of the same code)”, per the R For Data Science
book
The DRY principle can be applicable to other settings as well. For
example, data scientist David Robinson once
remarked (probably only half jokingly) that one should write a blog post
after giving the same piece of advice three time. 1
DRY and R Packages
I’d like to suggest applying the DRY principle to package creation–if
you use a set of 3 functions at least 3 times, then you should put them
in a package. In fact, there need not be more than 1 function or 1 use
case–given that the function or use case is significant enough–to
justify the initiation of a package.
Well-known R community member Bob Rud–who is among
the most active R
developers–
has made creating minimal, useful R packages commonplace. In his
description of his Stack Overflow Driven
Development,
he suggests that not only is such a practice useful for abstracting
functionality, but it can also be a great way to enhance one’s skills
while helping others.
If one is fearful of creating a package–perhaps due to the potential
responsibility of having to maintain it if it is used by others–then I
would suggest creating a “personal” package (that you don’t really
intend to be used by anyone else). In fact, I believe it is fairly
common practice for active R users to have their own personal packages.
2 For example,
- Bob Rud’s
{hrbrmisc}
and
hrbrthemes
- David Robinson’s
{drlib}
- Julia Silge’s
{silgelib}
- Strenge Jack’s
{sjmisc}
,
{sjPlot}
,
{sjstats}
, and
{sjlabelled}
Following the examples of others, I have created several personal
packages for atomic purposes. 3
{tetext}
for Tidy Text
Mining principles.{teplot}
for plotting
functions.{teproj}
for
project-related functions.
Examples
To give an example of how such a package can be useful, I’ll describe
some recent additions to my {teplot}
package to assist with
geo-spatial visualization of single U.S. states.
I was working on something related to high schools in Texas (look out
for a blog post in the future) and was beginning to copy-paste some
functions that I had used for another project for visualizing
geo-spatial data in the state. The moment I thought about reusing the
code was the moment that I realized that I should put it in a package.
Now, visualizing geo-spatial data in Texas is as easy as follows.
library("ggplot2") library("teplot") library("ggmap") library("dplyr") path <- "https://raw.githubusercontent.com/tonyelhabr/uil-v02/master/data/schools-nces-join.csv" schools_geo <- readr::read_csv(path) viz_map_base <- teplot::create_map_state(state = "texas", show_county = FALSE) + teplot::theme_te() + teplot::theme_map() viz_map_bycnts <- viz_map_base + geom_polygon( data = schools_geo %>% count(county) %>% inner_join( teplot::get_map_data_county_tx(), by = c("county" = "subregion") ), aes(fill = n) ) + scale_fill_gradient( trans = "log", low = "white", high = "red" ) + theme(legend.position = "none") viz_map_bycnts
I also added a stamen
map to the package
data so that I can use it easily as a base layer.
viz_map_bycnts_stamen <- teplot::ggmap_stamen_tx + geom_point( data = schools_geo, aes(x = lon, y = lat), color = "red", size = 1 ) viz_map_bycnts_stamen
To give a separate example, I used my {tetext}
package for nearly all
of the code in the flexdashboard that I created for analyzing the
Twitter acccounts of NBA
teams. There, I
simply called the function tetext::visualize_time_facet()
to generate
a fairly illustrative visual.
viz_time_facet_all <- data_facet_trim %>% tetext::visualize_time_facet( timebin = timestamp, color = id, facet = id, scale_manual_config = list(values = colors_filt), facet_config = list(scales = "fixed"), labs_config = list(title = NULL) ) viz_time_facet_all
Getting Started
If one has no experience with creating packages and does not know where
to get started, there are plenty of awesome resources out there to learn
more about it. To name a few:
- Hillary Parker’s “famous” blog
post - R Packages book by Hadley
Wickham - Karl Broman’s primer
- Jenny Bryan’s class tutorial
(There’s a good reason why these resources show up at the top of a
Google search for “R packages”.)
Although it may seem daunting at first, one should realize that the
pay-off will be great. (Just think about how much time, effort, and
debugging you save when writing a function. Now scale that feeling by
the number of functions that you include in your package!) I created my
first package to assist with using my company’s color scheme in plots.
Up until that point, I had been needlessly copy-pasting the same hex
values into each separate project where I wanted to use the color
palette. (If this happens to be your use case, then check out Dr. Simon
J’s blog
post
on exactly this topic!)
Conclusion
Even if it you don’t program much (or at all), the DRY principle will
undoubtedly be applicable to you at some point in time. If you’re
working with R
, then I suggest using packages as a solution to your
DRY problems.
- Hadley Wickham suggested that a book might be even better.
^ - Although several of these have actually been published on CRAN (suggesting that they are really more than just personal packages), each started out as just a package for the individual.
^ - It seems common to include one’s initials in the name of a personal package, so I have copied that format.
^
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.