magrittr’s Doppelgänger

December 13, 2016
By

(This article was first published on R – Win-Vector Blog, and kindly contributed to R-bloggers)

R picked up a nifty way to organize sequential calculations in May of 2014: magrittr by Stefan Milton Bache and Hadley Wickham. magrittr is now quite popular and also has become the backbone of current dplyr practice.

If you read my last article on assignment carefully you may have noticed I wrote some code that was equivalent to a magrittr pipleline without using the “%>%” operator. This note will expand (tongue in cheek) that notation into an alternative to magrittr that you should never use.


SupermanBizarro

What follows is a joke (though everything does work as I state it does, nothing is faked).

magrittr

[magrittr] Provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. There is flexible support for the type of right-hand side expressions. For more information, see package vignette. To quote Rene Magritte, “Ceci n’est pas un pipe.”

(from the package description)

Once you read up on magrittr and try some examples you tend to be sold. magrittr is a graceful notation for chaining multiple calculations and managing intermediate results. For our example consider in R the following chain of function applications:

sqrt(tan(cos(sin(7))))

 # [1] 1.006459

library("magrittr")
7 %>% sin() %>% cos() %>% tan() %>% sqrt()

 # [1] 1.006459

Both are artificial examples, but the magrittr notation is much easier to read. The pipe notation removes some of the pain of chaining so many functions and is a good realization of the mathematical function composition operator traditionally written as “(g ⚬ f)(x) = g(f(x))“. The replacing of nesting with composition allows us to read left to right instead of right to left.

Bizarro magrittr

magrittr itself is largely what is called “syntactic sugar” (though if you look at the code, say by “print(magrittr::`%>%`)” you will see magrittr commands some fairly heroic control of the evaluation order to achieve its effect). If we didn’t care about syntax we could write processing pipelines without magrittr::`%>%` as follows.

# "Piping" without magrittr.

7 ->.; sin(.) ->.; cos(.) ->.; tan(.) ->.; sqrt(.)

 # [1] 1.006459

The above is essentially the same pipeline (modulo some issues regarding printing, and the visibility and lifetime of “.“). We could even write it with the industry preferred left arrow by using “;.<-” throughout (though we would need to use “->.;.<-” to start such a pipleline). What I am saying if we thought of “->.;” as an atomic (indivisible plus non-mixable) glyph (as we are already encouraged to think of “<-” as) then that glyph is pretty much a piping operator. In a perverse sense “->.;” is a poor man’s “%>%“. Oddly enough we can think of the semicolon as doing the heavy lifting as it is a statement sequencer (and functional programming monads can be thought of as “programmable semicolons”).

Things Get Worse

->.;” may be slightly faster than “%>%“. It makes sense, as the semicolon-hack is doing a lot less for us than a true magrittr pipe. This difference (which is not important) is only going to show up when when we have a tiny amount of data, where the expression control remains a significant portion of the processing time (which it never is in practice!). magrittr is in fact fast, it is just that doing nothing is a tiny bit faster.

Everything below is a correct calculation, it is just a deliberate example of going too far measuring something that does not matter. The sensible conclusion is: use magrittr, despite the following sillyness.

library("microbenchmark")
library("magrittr")
library("ggplot2")
set.seed(234634)


fmagrittr <- function(d) {
  7 %>% sin() %>% cos() %>% tan() %>% sqrt()
}

fmagrittrdot <- function(d) {
  7 %>% sin(.) %>% cos(.) %>% tan(.) %>% sqrt(.)
}

fsemicolon <- function(d) {
  d ->.; sin(.) ->.; cos(.) ->.; tan(.) ->.; sqrt(.)
}

bm <- microbenchmark(
  fmagrittr(7),
  fmagrittrdot(7),
  fsemicolon(7),
  control=list(warmup=100L,order='random'),
  times=10000L
)

print(bm)

 # Unit: nanoseconds
 #             expr    min       lq       mean   median       uq      max neval
 #     fmagrittr(7) 133447 140496.5 161386.520 146348.5 151024.0 35967729 10000
 #  fmagrittrdot(7) 123223 130531.0 146873.208 135918.5 140309.5  2759953 10000
 #    fsemicolon(7)    910   1347.5   1585.334   1535.0   1711.0    41647 10000

t.test(bm$time[bm$expr!='fsemicolon(7)'],
       bm$time[bm$expr=='fsemicolon(7)'])

 # 	Welch Two Sample t-test
 # 
 # data:  bm$time[bm$expr != "fsemicolon(7)"] and bm$time[bm$expr == "fsemicolon(7)"]
 # t = 79.106, df = 19999, p-value < 2.2e-16
 # alternative hypothesis: true difference in means is not equal to 0
 # 95 percent confidence interval:
 #  148764.8 156324.3
 # sample estimates:
 #  mean of x  mean of y 
 # 154129.864   1585.334 

highcut <- quantile(bm$time,probs=0.95)
table(bm$expr[bm$time>=highcut])

 #    fmagrittr(7) fmagrittrdot(7)   fsemicolon(7) 
 #            1007             493               0 

ggplot(data=as.data.frame(bm),aes(x=time,color=expr)) +
  geom_density(adjust=0.3) +
  facet_wrap(~expr,ncol=1,scales = 'free_y') +
  scale_x_continuous(limits = c(min(bm$time),highcut))

Rplot02

Conclusion

I am most emphatically not suggesting use of “->.;” as a poor man’s “%>%“! But there is a relation, both “%>%” and semicolon are about sequencing statements.

Again, everything above was a joke (though nothing was fake, everything does run as I claimed it did).

To leave a comment for the author, please follow the link and comment on their blog: R – Win-Vector Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)