A small logical change with big impact

[This article was first published on Revolutions, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In R, the logical || (OR) and && (AND) operators are unique in that they are designed only to work with scalar arguments. Typically used in statements like

while(iter < 1000 && eps < 0.0001) continue_optimization()

the assumption is that the objects on either side (in the example above, iter and eps) are single values (that is, vectors of length 1) — nothing else makes sense for control flow branching like this. If either iter or eps above happened to be vectors with more than one value, R would silently consider only the first elements when making the AND comparison.

In an ideal world that should never happen, of course. But R programmers, like all programmers, do make errors, and using non-scalar values with || or && is a good sign that something, somewhere, has gone wrong. And with an experimental feature in the next version of R, you will be able to set an environment variable, _R_CHECK_LENGTH_1_LOGIC2_, to signal a warning or error in this case. But why make it experimental, and not the default behavior? Surely making such a change is a no-brainer, right?

It turns out things are more difficult than they seem. The issue is the large ecosystem of R packages, which may rely (wittingly or unwittingly) on the existing behavior. Suddenly throwing an error would throw those packages out of CRAN, and all of their dependencies as well.

We can see the scale of the impact in a recent R Foundation blog post by Tomas Kalibara where he details the effort it took to introduce an even simpler change: making if(cond) { ... } throw an error if cond is a logical vector of length greater than one. Again, it seems like a no-brainer, but when this change was introduced experimentally in March 2017, 154 packages on CRAN started failing because of it ... and that rose to 179 packages by November 2017. The next step was to implement a new CRAN check to alert those package authors that, in the future, their package would fail. It wasn't until October 2018 that the number of affected packages had dropped to a suitable level that the error could actually be implemented in R as well.

So the lesson is this: with an ecosystem as large as CRAN, even "trivial" changes take a tremendous amount of effort and time. They involve testing the impact on CRAN packages, coding up tests and warnings for maintainers, and allowing enough time for packages to be udpated. For an in-depth perspective, read Tomas Kalibera's essay at the link below.

R Developer blog: Tomas Kalibera: Conditions of Length Greater Than One

To leave a comment for the author, please follow the link and comment on their blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)