Long-vector kludge in R

July 25, 2012
By

(This article was first published on Digithead's Lab Notebook, and kindly contributed to R-bloggers)

The R Project

Just recently, I found out that R is limited to 32-bit integers, even on 64-bit hardware. Bummer, huh? As a consequence, the maximum size of a vector is 2^31-1. To be fair, dealing with numeric types across machine architectures is hard. A fixed representation has a lot of advantages. Java did the same thing, and now has a similar problem with 64-bit. Neither the R language specification nor R Internals says much about numeric representation.

Of course, there's pressure from big data people with big machines to increase these limits. The result, I've heard at the BioC2012 meeting, seems to be a horrific implementation of long vectors. Apparently, in the R-devel branch is code that uses doubles to index long vectors, taking advantage of the 52-bits of significand in an IEEE double. This makes my teeth hurt. R already has a well-deserved reputation for quirks. Introducing an kludge like this into the language cannot be helpful in this regard.

In moving beyond the 32-bit limitation, it would be great if the R community took this opportunity to re-address numeric representation in R. Some languages (Ruby, Mathematica) seamlessly promote up to arbitrary precision. The argument can be made that high precision isn't relevant to statistical use-cases, but it would be a fun addition to the language.

There are at least a couple R packages for arbitrary precision arithmetic: Ryacas and Rmpfr. Maybe, packages are the right place for this stuff, given that it's somewhat niche functionality.

In spite of it's quirks, I love R. It's a powerful tool for data manipulation, with a depth of statistical libraries unmatched anywhere. And, once you get used to the funky syntax, the functional goodness of the language shines through. But, there should be a place for 64 bit integers, at least. Why not do it right?

To leave a comment for the author, please follow the link and comment on his blog: Digithead's Lab Notebook.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: ,

Comments are closed.