[This article was first published on R – Win-Vector Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I have written about referential transparency before. In this article I would like to discuss “leaky abstractions” and why wrapr::let() supplies a useful (but leaky) abstraction for R programmers.

## Abstractions

A common definition of an abstraction is (from the OSX dictionary):

the process of considering something independently of its associations, attributes, or concrete accompaniments.

In computer science this is commonly taken to mean “what something can be thought to do independent of caveats and implementation details.”

### The magrittr abstraction

In R one traditionally thinks of the magrittr "%>%" pipe abstractly in the following way:

 Once "library(magrittr)" is loaded we can treat the expression:

7 %>% sqrt()

as if the programmer had written:

sqrt(7)
.


That is the abstraction of magrittr into terms one can reason about and plan over. You think of x %>% f() as a synonym for f(x). This is an abstraction because magrittr is not in fact implemented as a macro source-code re-write, but in in terms of function argument capture and delayed evaluation. And as Joel Spolsky famously wrote:

All non-trivial abstractions, to some degree, are leaky.

The magrittr pipe is non-trivial (in the sense of doing interesting work) because it works as if it were a syntax replacement even though you can use it more places than you could ask for such a syntax replacement. The upside is: magrittr makes two statements behave nearly equivalently. The downside is: we expect this to fail in some corner cases. This is not a criticism; it is as Bjarne Stroustrup wrote:

There are only two kinds of languages: the ones people complain about and the ones nobody uses.

### The tidyeval/rlang abstraction

The package dplyr 0.6.* brings in a new package called rlang to supply a capability called tidyeval. Among the abstractions it supplies are: operators for quoting and un-quoting variable names. This allows code like the following, where a dplyr::select() takes a variable name from a user supplied variable (instead of the usual explicit take from the text of the dplyr::select() statement).

library("dplyr")
packageVersion("dplyr")
# [1] ‘0.5.0.9004’
varName = quo(disp)
mtcars %>% select(!!varName) %>% head()
#                   disp
# Mazda RX4          160
# Mazda RX4 Wag      160
# Datsun 710         108
# Hornet 4 Drive     258
# Hornet Sportabout  360
# Valiant            225


Notice in the above example we had to specify the abstract varName by calling quo() on a free variable name (disp) and did not take the value from a string. tidyeval is working hard to supply a parametrizable non-standard interface, and it doesn’t look like a standard interface is the central goal. That is: the following is not intended to work:

varName <- quo(colnames(mtcars)[[1]])
mtcars %>% select(!!varName) %>% head()
# Error: colnames(mtcars)[[1]]: must resolve to integer column positions, not string


This is unfortunate as the main reason you want to parameterize over variable names is that the names are coming from somewhere else, and likely supplied as strings not as quosures (which themselves carry details of environment, meaning they are more like bound variables than free variables). I am sure you can convert a string into a column reference in rlang/tidyeval but it doesn’t seem to be the central use case (or is least not held out as such in the help and examples).

### The wrapr::let() abstraction

Our wrapr package can abstract the recent example (working over strings instead of “quosure” classes) as follows.

The (leaky) abstraction is:

varName <- 'var'; wrapr::let(VAR=varName, expr(VAR))” is treated as if the user had written expr(var).

This can be also thought of as form of unquoting as you do see one set of quotes disappear.

Let’s try it:

library("wrapr")
x <- 5
varName <- 'x'
let(c(VAR=varName), VAR)
# [1] 5


Or moving back to our dplyr::select() example:

varName <- 'disp'
let(
c(VARNAME = varName),
mtcars %>% select(VARNAME) %>% head()
)
#                    disp
# Mazda RX4          160
# Mazda RX4 Wag      160
# Datsun 710         108
# Hornet 4 Drive     258
# Hornet Sportabout  360
# Valiant            225


And wrapr::let() can also conveniently handle the “varName <- colnames(mtcars)[[1]]” case.

## An issue

dplyr issue 2726 (reproduced below) discusses a very important and interesting issue.

At a cursory glance the two discussed expressions and the work-around may seem alien, artificial, or even silly:

1. (function(x) select(mtcars, !!enquo(x)))(disp)
2. (function(x) mtcars %>% select(!!enquo(x)))(disp)
3. (function(x) { x <- enquo(x); mtcars %>% select(!!x)})(disp) 

However, this is actually a very crisp and incisive example. In fact, if rlang/tidyeval were a system up for public revision (such as a RFC or some such proposal) you would expect the equivalence of the above to be part of an acceptance suite.

The first expression looks very much like rlang/tidyeval package examples and is the “right way” in rlang/tidyeval to send in a column name parametrically. It is in the style preferred by the new package so by the package standards can not be considered complicated, perverse, or verbose. The second expression differs from the first only by the application of the “magrittr invariant” of “x %>% f() is to be considered equivalent to f(x)“.

The outcome is the first expression currently executes as expected, and the second expression errors-out. This can be considered surprising as this is not something anticipated in the documentation or recipes for building up tidy expressions. This is a leak in the combined abstractions, something we are told to back away from as it doesn’t work.

The proposed work-around (expression 3) is helpful, but itself demonstrates another leak in the mutual abstractions. Think of it this way: suppose we had started with expression 3 as working code. We would by referential transparency expect to be able to refactor the code and replace x with its value and move from this third working example to the second expression (which happens to fail).

To summarize: expressions 1 and 3 are equivalent. They differ by two refactoring steps (introduction/removal of pipes, and introduction/removal of a temporary variable). But we can not demonstrate the equivalence by interpolating in 2 named transformations (going from 1 to 2 to 3, or from 3 to 2 to 1) as the intermediate expression 2 is apparently not valid.

The wrapr::let version of the issue author’s desired expression 2 is:

  (function(x) let(c(X = x), mtcars %>% select(X)))('disp')


## Conclusion

wrapr::let() is a useful abstraction:

• It directly takes strings as variable names (the most common source of parametric variable names).
• It is a marco-like replacement and easy to teach as a code re-writing abstraction.
• It has a small interaction surface, and plays well with delayed evaluation packages such as magrittr and dplyr 0.5.0.

To leave a comment for the author, please follow the link and comment on their blog: R – Win-Vector Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

# Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts.(You will not see this message again.)

Click here to close (This popup will not appear again)