R Tip: Make Arguments Explicit in magrittr/dplyr Pipelines

March 1, 2018
By

(This article was first published on R – Win-Vector Blog, and kindly contributed to R-bloggers)

I think this is the R Tip that is going to be the most controversial yet. Its potential pitfalls include: it is a style prescription (which makes it different than and less immediately useful than something of the nature of R Tip: Force Named Arguments), and it is heterodox (this is not how magrittr/dplyr is taught by the original authors, and not how it is commonly used). However, I have not been at all good at anticipating which tips get which sort of reception (and this valuable feedback, public and private, is part of what I get of this series).

On to the tip (which only applies if you are a magrittr pipeline user).

R tip: when using magrittr pipelines consider making them more explicit, and more readable (especially to novices) by using explicit dot-arguments throughout.

The advice is: write pipelines that look like the this:

suppressPackageStartupMessages(library("dplyr"))

starwars %>%
  filter(., height > 200) %>%
  select(., height, mass) %>%
  head(.)

And avoid overly concise pipelines such as the this:

starwars %>%
  filter(height > 200) %>%
  select(height, mass) %>%
  head

The guidance is: each step in a simple magrittr pipeline is a function call that has at least one of its arguments directly written as “.“. Example: “atan2(3, .)” is a simple step, but neither “atan” nor “atan2(abs(.), 5)” is a simple step.

The intended point is: the first pipeline is more explicit and regular. This makes it easier to explain and easier for newcomers to read. For pipelines limited to this style: approximately each step is run in sequence as if the value of the last step were in a variable named “.“.

Note: the exact magrittr semantics are in fact more detailed that what I just said. The idea is to start newcomers in a sub-dialect of magrittr that has a simpler correct mental model before (or if ever) moving to the full details. The full details are perhaps more than a part time R user should be expected to remember. It is a bit much to expect a non-cognoscenti always remember that “5 %>% atan2(3, .)” is completely different than “5 %>% atan2(3, abs(.))“, and that “5 %>% {. + 1}” is completely different than “5 %>% (. + 1)“.

To leave a comment for the author, please follow the link and comment on their blog: R – Win-Vector Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)