R Tip: Use let() to Re-Map Names

[This article was first published on R – Win-Vector Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Another R tip. Need to replace a name in some R code or make R code re-usable? Use wrapr::let().



Here is an example involving dplyr.

Let’s look at some example data:

library("dplyr")
library("wrapr")

starwars %>%
  select(., name, homeworld, species) %>%
  head(.)

# # A tibble: 6 x 3
# name           homeworld species
# <chr>          <chr>     <chr>  
# 1 Luke Skywalker Tatooine  Human  
# 2 C-3PO          Tatooine  Droid  
# 3 R2-D2          Naboo     Droid  
# 4 Darth Vader    Tatooine  Human  
# 5 Leia Organa    Alderaan  Human  
# 6 Owen Lars      Tatooine  Human  

For “%>%/.” please see R Tip: Make Arguments Explicit in magrittr/dplyr Pipelines. Also, though we will not use it here, we feel separating argument types (data versus columns) in select() is much more comprehensible and made easy using qc() notation such as “select(., qc(name, homeworld, species))“.

Now let’s change the name of one column. The challenge will be: the name of the old column and the new name will not be known at the time of writing the code (a common problem when writing re-usable functions or code).

Suppose the remapping is specified in variables, as below.

newname <- "genus"
oldname <- "species"

We could prepare to work with column names as values using
wrapr::let() as we show here.

let(
  alias = c(NEWNAME = newname, 
            OLDNAME = oldname),
  starwars %>%
    rename(., NEWNAME = OLDNAME) %>%
    select(., name, homeworld, NEWNAME) %>%
    head(.)
)

#   name           homeworld genus
#   <chr>          <chr>     <chr>
# 1 Luke Skywalker Tatooine  Human
# 2 C-3PO          Tatooine  Droid
# 3 R2-D2          Naboo     Droid
# 4 Darth Vader    Tatooine  Human
# 5 Leia Organa    Alderaan  Human
# 6 Owen Lars      Tatooine  Human

The merit of the above notation is the exact new names "species" and "genus" may come from variables, and do not need to be known to the programmer writing the let()-block. There are other methods to attempt such substitution (which were actually publicly pre-announced only after let() had already been publicly announced and in CRAN distribution; so let() is in fact known prior art despite apparently not being cited). In our experience (and opinion) wrapr::let() is by far the most legible, teachable, and reliable code-rewriting (or meta-programming) tool for this task in R. It is a good choice for part time R users and we are working on formal documentation for expert users.

Another alternative is to use the seplyr package, which wraps dplyr operators into more standard value oriented notation. The above example in seplyr is as follows.

library("seplyr")

starwars %>%
  rename_se(., newname := oldname) %>%
  select_se(., c("name", "homeworld", newname)) %>%
  head(.)

Let’s finish with an example from the dplyr 0.7.0 announcement. The following is code from that announcement:

my_var <- "homeworld"

starwars %>%
  group_by(.data[[my_var]]) %>%
  summarise_at(vars(height:mass), mean, na.rm = TRUE)
  
# # A tibble: 49 x 3
#    my_var         height  mass
#    <chr>           <dbl> <dbl>
#  1 Alderaan        176.   64.0
#  2 Aleen Minor      79.0  15.0
#...

Notice the grouping column is incorrectly named as “my_var” (some other places this was noticed: 1, 2, 3). This is not harmless, as code attempting to refer to the original name will fail. The above is possibly not the current preferred rlang notation, which has been iterating through “!!” and “UQ()” (though I think UQ() is already “soft deprecated”). My theory is the correct form may be the even more cumbersome “.data[[!!my_var]]” even though this is not being commonly taught. However, even if the original code is indeed “malformed rlang/dplyr” (that is: outside the intended variations of the grammar), notice: that it was not caught or signaled. And at least at some point recently the shorter notation was being taught by the package authors. So it is hard to consider the rlang notation and teaching quite settled.

The equivalent let() notation is easy and works correctly.

let(
  c(MY_VAR = my_var),
  starwars %>%
    group_by(MY_VAR) %>%
    summarise_at(vars(height:mass), mean, na.rm = TRUE)
)
# # A tibble: 49 x 3
# homeworld      height  mass
# <chr>           <dbl> <dbl>
#   1 Alderaan        176.   64.0
#   2 Aleen Minor      79.0  15.0

The seplyr equivalent is the following:

starwars %>%
  group_by_se(., my_var) %>%
  summarise_at(vars(height:mass), mean, na.rm = TRUE)

If you absolutely must have “data pronouns” (such as the “.data” notation), those are actually fairly easy to add to classic base-R pipe enhanced functions. Though we feel most R users avoid need of such pronouns through proper use of common R structured environment nesting conventions (just as many programmers do not feel the need for a “goto” statement when they stick to structured coding conventions).

To leave a comment for the author, please follow the link and comment on their blog: R – Win-Vector Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)