Big vectors coming to R

[This article was first published on Revolutions, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R has been available as a 64-bit application since it's earliest days. But the internal representation of R's fundamental data type — the vector — has long been subject to a 32-bit limitation: the maximum number of elements is capped at 2^31 (or just over 2.1 billion) elements. Now, at 8 bytes per element that's 16Gb of data, so that wasn't a limitation until machines with massive amounts of RAM came along. And even then compound objects like data frames and lists can contain multiple vectors (and so exceed the 16Gb limit), so not many people noticed the issue.

But now that R-compatible servers with hundeds of gigabytes of RAM are available, the vector-length limitation does rear its head occasionally. The main culprit is actually computations with large-scale matrix algebra: in R, a matrix is internally represented as a single vector, so the largest possible square matrix has a dimension of 46,340. This can be a problem in financial analysis, for example, where calculating a covariance matrix of 50,000 or more time series is no longer unusual.

But it looks like that limitation is set to be lifted in a future version of R, as comments in the NEWS file for the development version of R suggest:

There are the beginnings of support for vectors longer than 2^31 – 1 elements on 64-bit platforms. This applies to raw, logical, integer, double, complex and character vectors, as well as lists. (Elements of character vectors remain limited to 2^31 – 1 bytes.)

All aspects are currently experimental.

This is a very exciting development: with full big-vector support, linear algebra in R will extend to a whole new order of magnitude, provided you have enough RAM available to hold the data. For those of us with more modest platforms, the bigmemory and RevoScaleR packages continue to provide tools for analyzing “Big Data” in the R environment.

The Simply Statistics blog has more on the forthcoming big vectors in R.

To leave a comment for the author, please follow the link and comment on their blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)