Just recently, I found out that R is limited to 32-bit integers, even on 64-bit hardware. Bummer, huh? As a consequence, the maximum size of a vector is 2^31-1. To be fair, dealing with numeric types across machine architectures is hard. A fixed representation has a lot of advantages. Java did the same thing, and now has a similar problem with 64-bit. Neither the R language specification nor R Internals says much about numeric representation.
Of course, there’s pressure from big data people with big machines to increase these limits. The result, I’ve heard at the BioC2012 meeting, seems to be a horrific implementation of long vectors. Apparently, in the R-devel branch is code that uses doubles to index long vectors, taking advantage of the 52-bits of significand in an IEEE double. This makes my teeth hurt. R already has a well-deserved reputation for quirks. Introducing an kludge like this into the language cannot be helpful in this regard.
In moving beyond the 32-bit limitation, it would be great if the R community took this opportunity to re-address numeric representation in R. Some languages (Ruby, Mathematica) seamlessly promote up to arbitrary precision. The argument can be made that high precision isn’t relevant to statistical use-cases, but it would be a fun addition to the language.
In spite of it’s quirks, I love R. It’s a powerful tool for data manipulation, with a depth of statistical libraries unmatched anywhere. And, once you get used to the funky syntax, the functional goodness of the language shines through. But, there should be a place for 64 bit integers, at least. Why not do it right?