**R – Win-Vector Blog**, and kindly contributed to R-bloggers)

R picked up a nifty way to organize sequential calculations in May of 2014: `magrittr`

by Stefan Milton Bache and Hadley Wickham. `magrittr`

is now quite popular and also has become the backbone of current `dplyr`

practice.

If you read my last article on assignment carefully you may have noticed I wrote some code that was equivalent to a `magrittr`

pipleline without using the “`%>%`

” operator. This note will expand (tongue in cheek) that notation into an alternative to `magrittr`

that you should never use.

What follows is a joke (though everything does work as I state it does, nothing is faked).

`magrittr`

[

`magrittr`

] Provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. There is flexible support for the type of right-hand side expressions. For more information, see package vignette. To quote Rene Magritte, “Ceci n’est pas un pipe.”

Once you read up on `magrittr`

and try some examples you tend to be sold. `magrittr`

is a graceful notation for chaining multiple calculations and managing intermediate results. For our example consider in `R`

the following chain of function applications:

```
```sqrt(tan(cos(sin(7))))
# [1] 1.006459
library("magrittr")
7 %>% sin() %>% cos() %>% tan() %>% sqrt()
# [1] 1.006459

Both are artificial examples, but the `magrittr`

notation is much easier to read. The pipe notation removes some of the pain of chaining so many functions and is a good realization of the mathematical function composition operator traditionally written as “`(g ⚬ f)(x) = g(f(x))`

“. The replacing of nesting with composition allows us to read left to right instead of right to left.

## Bizarro `magrittr`

`magrittr`

itself is largely what is called “syntactic sugar” (though if you look at the code, say by “`print(magrittr::`%>%`)`

” you will see `magrittr`

commands some fairly heroic control of the evaluation order to achieve its effect). If we didn’t care about syntax we could write processing pipelines without `magrittr::`%>%``

as follows.

```
```# "Piping" without magrittr.
7 ->.; sin(.) ->.; cos(.) ->.; tan(.) ->.; sqrt(.)
# [1] 1.006459

The above is essentially the same pipeline (modulo some issues regarding printing, and the visibility and lifetime of “`.`

“). We could even write it with the industry preferred left arrow by using “`;.<-`

” throughout (though we would need to use “`->.;.<-`

” to start such a pipleline). What I am saying if we thought of “`->.;`

” as an atomic (indivisible plus non-mixable) glyph (as we are already encouraged to think of “`<-`

” as) then that glyph is pretty much a piping operator. In a perverse sense “`->.;`

” is a poor man’s “`%>%`

“. Oddly enough we can think of the semicolon as doing the heavy lifting as it is a statement sequencer (and functional programming monads can be thought of as “programmable semicolons”).

## Things Get Worse

“`->.;`

” may be slightly faster than “`%>%`

“. It makes sense, as the semicolon-hack is doing a lot less for us than a true `magrittr`

pipe. This difference (which is not important) is only going to show up when when we have a tiny amount of data, where the expression control remains a significant portion of the processing time (which it never is in practice!). `magrittr`

is in fact fast, it is just that doing nothing is a tiny bit faster.

Everything below is a correct calculation, it is just a deliberate example of going too far measuring something that does not matter. The sensible conclusion is: use `magrittr`

, despite the following sillyness.

```
```library("microbenchmark")
library("magrittr")
library("ggplot2")
set.seed(234634)
fmagrittr <- function(d) {
7 %>% sin() %>% cos() %>% tan() %>% sqrt()
}
fmagrittrdot <- function(d) {
7 %>% sin(.) %>% cos(.) %>% tan(.) %>% sqrt(.)
}
fsemicolon <- function(d) {
d ->.; sin(.) ->.; cos(.) ->.; tan(.) ->.; sqrt(.)
}
bm <- microbenchmark(
fmagrittr(7),
fmagrittrdot(7),
fsemicolon(7),
control=list(warmup=100L,order='random'),
times=10000L
)
print(bm)
# Unit: nanoseconds
# expr min lq mean median uq max neval
# fmagrittr(7) 133447 140496.5 161386.520 146348.5 151024.0 35967729 10000
# fmagrittrdot(7) 123223 130531.0 146873.208 135918.5 140309.5 2759953 10000
# fsemicolon(7) 910 1347.5 1585.334 1535.0 1711.0 41647 10000
t.test(bm$time[bm$expr!='fsemicolon(7)'],
bm$time[bm$expr=='fsemicolon(7)'])
# Welch Two Sample t-test
#
# data: bm$time[bm$expr != "fsemicolon(7)"] and bm$time[bm$expr == "fsemicolon(7)"]
# t = 79.106, df = 19999, p-value < 2.2e-16
# alternative hypothesis: true difference in means is not equal to 0
# 95 percent confidence interval:
# 148764.8 156324.3
# sample estimates:
# mean of x mean of y
# 154129.864 1585.334
highcut <- quantile(bm$time,probs=0.95)
table(bm$expr[bm$time>=highcut])
# fmagrittr(7) fmagrittrdot(7) fsemicolon(7)
# 1007 493 0
ggplot(data=as.data.frame(bm),aes(x=time,color=expr)) +
geom_density(adjust=0.3) +
facet_wrap(~expr,ncol=1,scales = 'free_y') +
scale_x_continuous(limits = c(min(bm$time),highcut))

## Conclusion

I am most emphatically *not* suggesting use of “`->.;`

” as a poor man’s “`%>%`

“! But there is a relation, both “`%>%`

” and semicolon are about sequencing statements.

Again, everything above was a joke (though nothing was fake, everything does run as I claimed it did).

**leave a comment**for the author, please follow the link and comment on their blog:

**R – Win-Vector Blog**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...