# Design Flaws in R #3 — Zero Subscripts

September 21, 2008
By

(This article was first published on Radford Neal's blog » R Programming, and kindly contributed to R-bloggers)

Unlike the two design flaws I posted about before (here, here, and also here), where one could at least see a reason for the design decision, even if it was unwise, this design flaw is just  incomprehensible.  For no reason at all that I can see, R allows one to use zero as a subscript without triggering an error.  (Remember that in R, indexes for vectors and matrices start at one, not zero.)

This is of course a terrible decision, because it makes debugging harder, and makes it more likely that bugs will exist that have never been noticed.

So what does R do with a zero subscript, seeing as it’s meaningless?  It just ignores it, which is possible because it views all numeric subscripts as vectors, that extract or replace a set of elements, not necessarily just one.   So R simply removes all zeros from a vector used as a subscript, producing a shorter vector.

Here’s what happens (with the current version of R, 2.7.2):

   > a
[1] 10 20 30 40 50
> a[0]
numeric(0)
> a[c(4,2)]
[1] 40 20
> a[c(4,0,2,0)]
[1] 40 20
> a[0] <- 7
[1] 10 20 30 40 50
> a[c(4,0,2,0)] <- 7
[1] 10  7 30  7 50

Contrast this with what happens when you use a subscript that is too large:

   > a
[1] 10 20 30 40 50
> a[7]
[1] NA
> a[c(4,7,2)]
[1] 40 NA 20
> a[7] <- 7
[1] 10 20 30 40 50 NA  7

Extending vectors automatically when an assignment is made beyond the end can obviously be useful (though it might be wiser not to).  Returning NA when extracting an element beyond the end is also a sensible action (though signalling an error immediately might be more useful for debugging). And negative subscripts are usefully defined as referring to their complement. But what possible use is there for ignoring zero subscripts rather than signalling an error?

It’s perhaps belabouring the obvious, but let me explain that signalling an error when a zero subscript is used is desirable because this is a very common sort of program bug. It can easily arise when a program is scanning backwards through the vector elements, and goes one step too far. It can also easily arise when data is initialized to zeros, with the intent to replace the zeros with something sensible later, but actually some zeros are never replaced. The way R behaves when zero is used as a subscript when replacing elements is particularly bad, since doing nothing at all can easily lead to an apparently working program that produces wrong answers.  (The behaviour of returning an empty vector when zero is used as a subscript when extracting an element is more likely to produce an error later on, so that at least the problem will be evident.)

So what should be done?  That’s easy — change R so that use of zero as a subscript produces an immediate error.  That’s trivial to do (mixing positive and negative subscripts produces an immediate error now, so the apparatus for it must be there).  Might that break some existing programs?  Yes, it will.  But 99.9% of those programs are already broken.  The users just don’t know it, thinking that the answers they get are correct when they’re not.  The remaining 0.1% of these broken programs were written by really stupid programmers who thought that exploiting an obscure and unwise feature in order to produce a really hard-to-understand program was a good idea.  It wasn’t.

Along with this, R should be changed so that using NA as a subscript when replacing elements in a vector also produces an error.  What to do with NA subscripts used to extract elements is a little bit harder to decide, but it seems to me that something about the following is a bit funny:

   > a
[1] 10 20 30 40 50
> a[NA]
[1] NA NA NA NA NA
> a[NA+0]
[1] NA