Programming over R

April 21, 2017
By

(This article was first published on R – Win-Vector Blog, and kindly contributed to R-bloggers)

R is a very fluid language amenable to meta-programming, or alterations of the language itself. This has allowed the late user-driven introduction of a number of powerful features such as magrittr pipes, the foreach system, futures, data.table, and dplyr. Please read on for some small meta-programming effects we have been experimenting with.

NewImage

Meta-Programming

Meta-programming is a powerful tool that allows one to re-shape a programming language or write programs that automate parts of working with a programming language.

Meta-programming itself has the central contradiction that one hopes nobody else is doing meta-programming, but that they are instead dutifully writing referentially transparent code that is safe to perform transformations over, so that one can safely introduce their own clever meta-programming. For example: one would hate to lose the ability to use a powerful package such as future because we already “used up all the referential transparency” for some minor notational effect or convenience.

That being said, R is an open system and it is fun to play with the notation. I have been experimenting with different notations for programming over R for a while, and thought I would demonstrate a few of them here.

Let Blocks

We have been using let to code over non-standard evaluation (NSE) packages in R for a while now. This allows code such as the following:

library("dplyr")
library("wrapr")

d <- data.frame(x = c(1, NA))

cname <- 'x'
rname <- paste(cname, 'isNA', sep = '_')

let(list(COL = cname, RES = rname),
    d %>% mutate(RES = is.na(COL))
)

 #    x x_isNA
 # 1  1  FALSE
 # 2 NA   TRUE

let is in fact quite handy notation that will work in a non-deprecated manner with both dplyr 0.5 and dplyr 0.6. It is how we are future-proofing our current dplyr workflows.

Unquoting

dplyr 0.6 is introducing a new execution system (alternately called rlang or tidyeval, see here) which uses a notation more like the following (but fewer parenthesis, and with the ability to control left-hand side of an in-argument assignment):

beval(d %>% mutate(x_isNA = is.na((!!cname))))

The inability to re-map the right-hand side of the apparent assignment is because the “(!! )” notation doesn’t successfully masquerade as a lexical token valid on the left-hand side of assignments or function argument bindings.

And there was an R language proposal for a notation like the following (but without the quotes, and with some care to keep it syntactically distinct from other uses of “@”):

ateval('d %>% mutate(@rname = is.na(@cname))')

beval and ateval are just curiosities implemented to try and get a taste of the new dplyr notation, and we don’t recommend using them in production — their ad-hoc demonstration implementations are just not powerful enough to supply a uniform interface. dplyr itself seems to be replacing a lot of R‘s execution framework to achieve stronger effects.

Write Arrow

We are experimenting with “write arrow” (a deliberate homophone of “right arrow”). It allows the convenient storing of a pipe result into a variable chosen by name.

library("dplyr")
library("replyr")

'x' -> whereToStoreResult

7 %>% sin %>% cos %->_% whereToStoreResult

print(x)
 ## [1] 0.7918362

Notice, the value “7” is stored in the variable “x” not in a variable named “whereToStoreResult”. “whereToStoreResult” was able to name where to store the value parametrically.

This allows code such as the following:

for(i in 1:3) { 
  i %->_% paste0('x',i)
}

(Please run the above to see the automatic creation of variables named “x1”, “x2”, and “x3”, storing values 1,2, and 3 respectively.)

We know left to right assignment is heterodox; but the notation is very slick if you are consistent with it, and add in some formatting rules (such as insisting on a line break after each pipe stage).

Conclusion

One wants to use meta-programming with care. In addition to bringing in desired convenience it can have unexpected effects and interactions deeper in a language or when exposed to other meta-programming systems. This is one reason why a “seemingly harmless” proposal such as “user defined unary functions” or “at unquoting” takes so long to consider. This is also why new language features are best tried in small packages first (so users can easily chose to include them or not in their larger workflow) to drive public request for comments (RFC) processes or allow the ideas to evolve (and not be frozen at their first good idea, a great example of community accepted change being Haskel’s switch from request chaining IO to monadic IO; the first IO system “seemed inevitable” until it was completely replaced).

To leave a comment for the author, please follow the link and comment on their blog: R – Win-Vector Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)