Assignment in R: slings and arrows

[This article was first published on R – Doodling in Data, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Having recently shared my post about defensive programming in R on the r/rstats subreddit, I was blown away by the sheer number of comments as much as I was blown away by the insight many displayed. One particular comment by u/guepier struck my attention. In my previous post, I came out quite vehemently against using the = operator to effect assignment in R. u/guepier‘s made a great point, however:

But point 9 is where you’re quite simply wrong, sorry:

never, ever, ever use = to assign. Every time you do it, a kitten dies of sadness.

This is FUD, please don’t spread it. There’s nothing wrong with =. It’s purely a question of personal preference. In fact, if anything <- is more error-prone (granted, this is a very slight chance but it’s still higher than the chance of making an error when using =).

Now, assignment is no doubt a hot topic – a related issue, assignment expressions, has recently led to Python’s BDFL to be forced to resign –, so I’ll have to tread carefully. A surprising number of people have surprisingly strong feelings about assignment and assignment expressions. In R, this is complicated by its unusual assignment structure, involving two assignment operators that are just different enough to be trouble.

A brief history of <-

IBM Model M SSK keyboard with APL keys
This is the IBM Model M SSK keyboard. The APL symbols are printed on it in somewhat faint yellow.

There are many ways in which <- in R is anomalous. For starters, it is rare to find a binary operator that consists of two characters – which is an interesting window on the R <- operator’s history.

The <- operator, apparently, stems from a day long gone by, when keyboards existed for the programming language eldritch horror that is APL. When R’s precursor, S, was conceived, APL keyboards and printing heads existed, and these could print a single ← character. It was only after most standard keyboard assignments ended up eschewing this all-important symbol that R and S accepted the digraphic <- as a substitute.

OK, but what does it do?

In the Brown Book, the underscore was actually an alias for the arrow assignment operator.
In the Brown Book (Richard A. Becker and John M. Chambers (1984). S: An Interactive Environment for Data Analysis and Graphics), the underscore was actually an alias for the arrow assignment operator! Thankfully, this did not make it into R.
<- is one of the first operators anyone encounters when familiarising themselves with the R language. The general idea is quite simple: it is a directionally unambiguous assignment, i.e. it indicates quite clearly that the right-hand side value (rhs, in the following) will replace the left-hand side variable (lhs), or be assigned to the newly created lhs if it has not yet been initialised. Or that, at the very least, is the short story.

Because quite peculiarly, there is another way to accomplish a simple assignment in R: the equality sign (=). And because on the top level, a <- b and a = b are equivalent, people have sometimes treated the two as being quintessentially identical. Which is not the case. Or maybe it is. It’s all very confusing. Let’s see if we can unconfuse it.

The Holy Writ

The Holy Writ, known to uninitiated as the R Manual, has this to say about assignment operators and their differences:

The operators <- and = assign into the environment in which they are evaluated. The operator <- can be used anywhere, whereas the operator = is only allowed at the top level (e.g., in the complete expression typed at the command prompt) or as one of the subexpressions in a braced list of expressions.

If this sounds like absolute gibberish, or you cannot think of what would qualify as not being on the top level or a subexpression in a braced list of expressions, welcome to the squad – I’ve had R experts scratch their head about this for an embarrassingly long time until they realised what the R documentation, in its neutron starlike denseness, actually meant.

If it’s in (parentheses) rather than {braces}, = and <- are going to behave weird

To translate the scriptural words above quoted to human speak, this means = cannot be used in the conditional part (the part enclosed by (parentheses) as opposed to {curly braces}) of control structures, among others. This is less an issue between <- and =, and rather an issue between = and ==. Consider the following example:

x = 3

if(x = 3) 1 else 0
# Error: unexpected '=' in "if(x ="

So far so good: you should not use a single equality sign as an equality test operator. The right way to do it is:

> if(x == 3) 1 else 0
[1] 1

But what about arrow assignment?

if(x <- 3) 1 else 0
# [1] 1

Oh, look, it works! Or does it?

if(x <- 4) 1 else 0
# [1] 1

The problem is that an assignment will always yield true if successful. So instead of comparing x to 4, it assigned 4 to x, then happily informed us that it is indeed true.

The bottom line is not to use = as comparison operator, and <- as anything at all in a control flow expression’s conditional part. Or as John Chambers notes,

Disallowing the new assignment form in control expressions avoids programming errors (such as the example above) that are more likely with the equal operator than with other S assignments.

Chain assignments

One example of where  <- and = behave differently (or rather, one behaves and the other throws an error) is a chain assignment. In a chain assignment, we exploit the fact that R assigns from right to left. The sole criterion is that all except the rightmost members of the chain must be capable of being assigned to.

# Chain assignment using <-
a <- b <- c <- 3

# Chain assignment using =
a = b = c = 3

# Chain assignment that will, unsurprisingly, fail
a = b = 3 = 4
# Error in 3 = 4 : invalid (do_set) left-hand side to assignment

So we’ve seen that as long as the chain assignment is logically valid, it’ll work fine, whether it’s using <- or =. But what if we mix them up?

a = b = c <- 1
# Works fine...

a = b <- c <- 1
# We're still great...

a <- b = c = 1
# Error in a <- b = c = 1 : could not find function "<-<-"
# Oh.

The bottom line from the example above is that where <- and = are mixed, the leftmost assignment has to be carried out using =, and cannot be by <-. In that one particular context, = and <- are not interchangeable.

A small note on chain assignments: many people dislike chain assignments because they’re ‘invisible’ – they literally return nothing at all. If that is an issue, you can surround your chain assignment with parentheses – regardless of whether it uses <-, = or a (valid) mixture thereof:

a = b = c <- 3
# ...
# ... still nothing...
# ... ... more silence...

(a = b = c <- 3)
# [1] 3

Assignment and initialisation in functions

This is the big whammy – one of the most important differences between <- and =, and a great way to break your code. If you have paid attention until now, you’ll be rewarded by, hopefully, some interesting knowledge.

= is a pure assignment operator. It does not necessary initialise a variable in the global namespace. <-, on the other hand, always creates a variable, with the lhs value as its name, in the global namespace. This becomes quite prominent when using it in functions.

Traditionally, when invoking a function, we are supposed to bind its arguments in the format parameter = argument.1 And as we know from what we know about functions, the keyword’s scope is restricted to the function block. To demonstrate this:

add_up_numbers <- function(a, b) {
    return(a + b)
}

add_up_numbers(a = 3, b = 5)
# [1] 8

a + b
# Error: object 'a' not found

This is expected: a (as well as b, but that didn’t even make it far enough to get checked!) doesn’t exist in the global scope, it exists only in the local scope of the function add_up_numbers. But what happens if we use <- assignment?

add_up_numbers(a <- 3, b <- 5)
# [1] 8

a + b
# [1] 8

Now, a and b still only exist in the local scope of the function add_up_numbers. However, using the assignment operator, we have also created new variables called a and b in the global scope. It’s important not to confuse it with accessing the local scope, as the following example demonstrates:

add_up_numbers(c <- 5, d <- 6)
# [1] 11

a + b
# [1] 8

c + d
# [1] 11

In other words, a + b gave us the sum of the values a and b had in the global scope. When we invoked add_up_numbers(c <- 5, d <- 6), the following happened, in order:

  1. A variable called c was initialised in the global scope. The value 5 was assigned to it.
  2. A variable called d was initialised in the global scope. The value 6 was assigned to it.
  3. The function add_up_numbers() was called on positional arguments c and d.
  4. c was assigned to the variable a in the function’s local scope.
  5. d was assigned to the variable b in the function’s local scope.
  6. The function returned the sum of the variables a and b in the local scope.

It may sound more than a little tedious to think about this function in this way, but it highlights three important things about <- assignment:

  1. In a function call, <- assignment to a keyword name is not the same as using =, which simply binds a value to the keyword.
  2. <- assignment in a function call affects the global scope, using = to provide an argument does not.
  3. Outside this context, <- and = have the same effect, i.e. they assign, or initialise and assign, in the current scope.

Phew. If that sounds like absolute confusing gibberish, give it another read and try playing around with it a little. I promise, it makes some sense. Eventually.

So… should you or shouldn’t you?

Which raises the question that launched this whole post: should you use = for assignment at all? Quite a few style guides, such as Google’s R style guide, have outright banned the use of = as assignment operator, while others have encouraged the use of ->. Personally, I’m inclined to agree with them, for three reasons.

  1. Because of the existence of ->, assignment by definition is best when it’s structured in a way that shows what is assigned to which side. a -> b and b <- a have a formal clarity that a = b does not have.
  2. Good code is unambiguous even if the language isn’t. This way, -> and <- always mean assignment, = always means argument binding and == always means comparison.
  3. Many argue that <- is ambiguous, as x<-3 may be mistyped as x<3 or x-3, or alternatively may be (visually) parsed as x < -3, i.e. compare x to -3. In reality, this is a non-issue. RStudio has a built-in shortcut (Alt/⎇ + ) for <-, and automatically inserts a space before and after it. And if one adheres to sound coding principles and surrounds operators with white spaces, this is not an issue that arises.

Like with all coding standards, consistency is key. Consistently used suboptimal solutions are superior, from a coding perspective, to an inconsistent mixture of right and wrong solutions.

References   [ + ]

1. A parameter is an abstract ‘slot’ where you can put in values that configure a function’s execution. Arguments are the actual values you put in. So add_up_numbers(a,b) has the parameters a and b, and add_up_numbers(a = 3, b = 5) has the arguments 3 and 5.

The post Assignment in R: slings and arrows appeared first on Doodling in Data.

To leave a comment for the author, please follow the link and comment on their blog: R – Doodling in Data.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)