Long-vector kludge in R

[This article was first published on Digithead's Lab Notebook, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The R Project

Just recently, I found out that R is limited to 32-bit integers, even on 64-bit hardware. Bummer, huh? As a consequence, the maximum size of a vector is 2^31-1. To be fair, dealing with numeric types across machine architectures is hard. A fixed representation has a lot of advantages. Java did the same thing, and now has a similar problem with 64-bit. Neither the R language specification nor R Internals says much about numeric representation.

Of course, there’s pressure from big data people with big machines to increase these limits. The result, I’ve heard at the BioC2012 meeting, seems to be a horrific implementation of long vectors. Apparently, in the R-devel branch is code that uses doubles to index long vectors, taking advantage of the 52-bits of significand in an IEEE double. This makes my teeth hurt. R already has a well-deserved reputation for quirks. Introducing an kludge like this into the language cannot be helpful in this regard.

In moving beyond the 32-bit limitation, it would be great if the R community took this opportunity to re-address numeric representation in R. Some languages (Ruby, Mathematica) seamlessly promote up to arbitrary precision. The argument can be made that high precision isn’t relevant to statistical use-cases, but it would be a fun addition to the language.

There are at least a couple R packages for arbitrary precision arithmetic: Ryacas and Rmpfr. Maybe, packages are the right place for this stuff, given that it’s somewhat niche functionality.

In spite of it’s quirks, I love R. It’s a powerful tool for data manipulation, with a depth of statistical libraries unmatched anywhere. And, once you get used to the funky syntax, the functional goodness of the language shines through. But, there should be a place for 64 bit integers, at least. Why not do it right?

To leave a comment for the author, please follow the link and comment on their blog: Digithead's Lab Notebook.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)