Big vectors coming to R

July 26, 2012
By

(This article was first published on Revolutions, and kindly contributed to R-bloggers)

R has been available as a 64-bit application since it's earliest days. But the internal representation of R's fundamental data type — the vector — has long been subject to a 32-bit limitation: the maximum number of elements is capped at 2^31 (or just over 2.1 billion) elements. Now, at 8 bytes per element that's 16Gb of data, so that wasn't a limitation until machines with massive amounts of RAM came along. And even then compound objects like data frames and lists can contain multiple vectors (and so exceed the 16Gb limit), so not many people noticed the issue.

But now that R-compatible servers with hundeds of gigabytes of RAM are available, the vector-length limitation does rear its head occasionally. The main culprit is actually computations with large-scale matrix algebra: in R, a matrix is internally represented as a single vector, so the largest possible square matrix has a dimension of 46,340. This can be a problem in financial analysis, for example, where calculating a covariance matrix of 50,000 or more time series is no longer unusual.

But it looks like that limitation is set to be lifted in a future version of R, as comments in the NEWS file for the development version of R suggest:

There are the beginnings of support for vectors longer than 2^31 – 1 elements on 64-bit platforms. This applies to raw, logical, integer, double, complex and character vectors, as well as lists. (Elements of character vectors remain limited to 2^31 – 1 bytes.)

All aspects are currently experimental.

This is a very exciting development: with full big-vector support, linear algebra in R will extend to a whole new order of magnitude, provided you have enough RAM available to hold the data. For those of us with more modest platforms, the bigmemory and RevoScaleR packages continue to provide tools for analyzing "Big Data" in the R environment.

The Simply Statistics blog has more on the forthcoming big vectors in R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...