By Aimee Gott
At EARL 2014 I saw Hadley Wickham using the pipe operator from Stefan Milton Bache’s magrittr whilst also presenting the functionality of dplyr. I remember thinking at the time that this was going to be the new way to write R code and knew I would end up teaching this operator in many a training course. A year on and not a training course goes by where I don’t teach the pipe operator. The main thing that I have found myself telling people whilst teaching is that this operator is the best thing to have happened to R…ever!
Now you might be thinking I am about to write paragraphs about how much easier it makes reading and writing code or you might be one of the R users who are not so keen on it and would quite like to disagree with me. But forget the usage of the operator itself for a moment and think about what it has done for the standardisation of R functions.
This year I have been fortunate enough to be at a large number of user groups and conferences including UseR, EARL London and EARL Boston, where developers have presented their newly written R packages. And what have I heard most commonly in these talks? “Of course it is designed to work with the pipe operator”. Even if they don’t say the words it is clear from the code examples they present that they have thought about how it will work with this operator that we are using with such enthusiasm.
So what is the change? Invariably data is now the first argument, something that hasn’t always been the case – ‘qplot’ in ggplot2 is a great (and now incredibly frustrating!) example of the data argument not being first. I am sure we have all felt the pain of having to use the “.” to pipe into the data argument of ‘qplot’ and wished it could be updated! But given that we are using a language with no universal standards, with everyone having their own opinion on whether we should use R6, Reference, S3 or S4 classes, I am impressed at how quickly we are adopting this unspoken rule.
For those of you that want hard numbers to back up the claim, well that’s a little more difficult to get at. At the moment there are around 60 packages on CRAN that depend on or import magrittr but this tells us very little about those that have been designed to work with magrittr but do not insist upon it.
As far as I am aware this is the first time that a single R package has had such an impact on the way we code in R and the way we write our own functions. I’m sure many of you will disagree, many will find an example that I am not aware of but, for now, I am going to stick by my claim, and suggest a new addition to coding standards everywhere:
Data should always be the first argument to a function.