The Case For Using -> In R

December 12, 2016
By

(This article was first published on R – Win-Vector Blog, and kindly contributed to R-bloggers)

R has a number of assignment operators (at least “<-“, “=“, and “->“; plus “<<-” and “->>” which have different semantics).

The R-style guides routinely insist on “<-” as being the only preferred form. In this note we are going to try to make the case for “->” when using magrittr pipelines.


Honore Daumier 017 Don Quixote

Don Quijote and Sancho Panza, by Honoré Daumier

Assignment in R

R‘s preferred assignment operator is “<-“. This is in the popular style guides. If you write using this style you can organize your code so that:

  • <-” always means assignment
  • =” always means function argument binding
  • ==” always means comparison.

This has some advantages, and is the public style. Also “=” is much harder to use inside R’s base::quote method than “<-“, so there are still cases where the semantics of “=” and “<-” are different (though I think they all involve the distinction trying specify argument binding versus assignment while inside a function call’s argument list).

I have previously written that given the choice I prefer “=” for assignment. It has the advantages that:

  • <-” is has a different meaning to many readers. In Rx<-3” assigns the value 3 to a variable named x, in other popular programming languages (where new R users may be coming from) “x<-3” denotes comparing x to -3.
  • =” is a single character, so it can not be ruined by the insertion of a space. “x< -3” does not assign the value 3 to a variable named x, it compares x to -3. I would not mind so much if “x< -3” was a syntax error (as “x< =3” is), but it is valid code that quietly does something very different than “x<-3“. If you have taught R enough you have experience helping students undo this bug.
  • =” is on the keyboard (as “←” was when arrow like assignments were themselves introduced).
  • =” is easier to paste into HTML as it does not require escape coding such as “<“.
  • It is the symbol used in most every other popular current programming language for assignment.
  • There is an asymmetric cost of mistakes. Typing “=” when you meant “<-” is usually harmless. Typing “<-” in a context where “=” was needed is not caught by R and fairly bad (please see here for details). So if you get out of the habit of using “<-” one type of bug become less likely.
  • There is a cognitive benefit in reducing the number of low-value distinctions you need to maintain, especially for beginners. If we think of the mind as having “seven plus or minus two” slots for current information do we really want to waste 11 to 20 percent of our students’ attention on something like this when teaching? The beginner does not need to worry over the differences between value assignment and argument binding at all times. In fact it is a useful generalization to think of argument binding as a safe transient value assignment.

Now I said “given the choice” which means to work with others you have to use “<-” or at least admit that you are being stubborn. I teach “<- for assignment” as I do not wish to set up students for ridicule (and they being less informed on the history or R are less equipped to defend theirselves on this issue).

That being said I still don’t actually like “<-“. And in fact I am not sure why the R community has so fetishized its use. “<-” comes form an era when it was actually a symbol on the keyboard and two other S assignment operators from that era (“_” and “:=“) have have not survived in the R language (please see here). I think the style is largely enforced as a kind of argot or “inside language” to express loyalty to R.

A deliberately provocative proposal

That being said I have really come to like using R‘s “->” operator. I know I can’t always get away with it but consider the advantage using “->” brings to western readers (meaning users of Greek derived alphabets): you can then simply read code from left to right. If I am not allowed to use “=” I want something back in exchange, and “->” actually has some interesting advantages. Let us set up a proposal that is admittedly incompatible with my previous argument.

Consider the following statement:

  x = 3 + 4

This is read in R, and most common programming languages, as “assign the value of 3 + 4 to the variable x.” We know to read it this way because “assignment has lower operator precedence than plus.” Roughly this means there implicit parenthesization rules that mean “x=3+4” is actually shorthand for “x=(3+4)” (roughly because in R explicit use of parentheses also controls the auto-printing behavior of values). But consider the same statement written with “->“:

  3 + 4 -> x

The semantics still come from operator precedence rules, but now the syntax is emphasizing the same thing: the calculation happens before (to the left of) the assignment. This may not seem like much to experienced programmers- but that is because so many programming languages use the frankly unnatural “x=3+4” notation (so we are used to it).

A substantial advantage comes when using magrittr pipes in R.

Suppose I write the following magrittr pipeline:

# Count number of NA in columns x,y, 
# and z using pure dplyr notation
# or back-end agnostic dplyr code.  
# This involves avoiding use of $
# or things like multiple intermediate 
# values in dplyr::summarize.
# This is a useful example as 
# complete.cases isn't available on
# all dplyr data services.
# ifelse() is to ensure type 
# conversions on remote SQL.

library("dplyr")
my_db <- dplyr::src_sqlite(":memory:", create = TRUE)

data.frame(
           x = c(1, 2, 2),
           y = c(3, 5, NA),
           z = c(NA, 'a', 'b'),
           rowNum = 1:3,
           stringsAsFactors = FALSE
          ) %>%
  copy_to(my_db, ., 'd') %>%
  mutate(nna = ifelse(is.na(x),1,0) +
               ifelse(is.na(y),1,0) + 
               ifelse(is.na(z),1,0)) %>%
  arrange(rowNum) -> dres

In this notation we see that now “->” is itself a pipe compatible operator that moves values to variables. The pipeline itself is already moving left to right top to down. Placing the assignment first would give us an ugly two directional flow.

Non semantic changes in the pipeline are now syntactically cheap and localized (as they should be). For example: want to land intermediate results for reasons of efficiency or necessary side-effects? Solution: insert “-> varName LINEBREAK varName %>%” at will, as you already do with dplyr::collapse() and dplyr::compute().

The syntax is now working for us instead of against us. I feel once you start using magrittr pipelines (which are written left to right, as we did here) the next logical step is use “->” for consistency.

Syntax Matters

The following code has essentially the same semantics as the previous magrittr pipes, without needing a piping operator.

data.frame(
           x = c(1, 2, 2),
           y = c(3, 5, NA),
           z = c(NA, 'a', 'b'),
           rowNum = 1:3,
           stringsAsFactors = FALSE
          ) -> .
  copy_to(my_db, ., 'd2') -> .
  mutate(., nna = ifelse(is.na(x),1,0) +
                  ifelse(is.na(y),1,0) + 
                  ifelse(is.na(z),1,0)) -> .
  arrange(., rowNum) -> dres

The above code has the advantage that it is easier to debug in that you can stop at any stage and the intermediate results are convenient to inspect. However, there was no great call for code in this style (or the matching beginning of line “. <-” version) prior to the introduction of magrittr. It just isn’t as enjoyable to use a mere coding convention as it is to use magrittr pipe syntax.

Conclusion

  • I honestly think in a magrittr world “->” is a natural assignment operator and could make teaching R easier. It reads more fluidly once you get used to it and come to expect assignment to be written late (i.e. once you know where to look).
  • I can not currently recommend actually using “->” in other people’s projects as it is not currently allowed under the most popular R style guides. Both: Advanced R by Hadley Wickham and Google’s R Style Guide say: “Use <-, not =, for assignment.“
  • I would like to propose that “->” be considered an allowed assignment operator with the stricture code should not reverse directions too often (as that is, in fact, confusing). If you control one of the named style guides, please do consider my suggestion.

To leave a comment for the author, please follow the link and comment on their blog: R – Win-Vector Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)