# Using bigmemory with Rcpp

**Rcpp Gallery**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The bigmemory package allows users to create matrices that are stored on disk, rather than in RAM. When an element is needed, it is read from the disk and cached in RAM. These objects can be much larger than native R matrices. Objects stored as such larger-than-RAM matrices are defined in the `big.matrix`

class and they are designed to behave similar to R matrices. However, they are actually implemented in C++ and can be easily accessed and manipulated directly from Rcpp as this example shows.

#include// The next line is all it takes to find the bigmemory // headers -- thanks to the magic of Rcpp attributes // [[Rcpp::depends(bigmemory)]] #include #include // [[Rcpp::export]] Rcpp::NumericVector BigColSums(Rcpp::XPtr pBigMat) { // Create the matrix accessor so we can get at the elements of the matrix. MatrixAccessor ma(*pBigMat); // Create the vector we'll store the column sums in. Rcpp::NumericVector colSums(pBigMat->ncol()); for (size_t i=0; i < pBigMat->ncol(); ++i) colSums[i] = std::accumulate(ma[i], ma[i]+pBigMat->nrow(), 0.0); return colSums; }

A `BigMatrix`

object stores elements in a *column major* format, meaning that values are accessed and filled in by column, rather than by row. The `MatrixAccessor`

implements the bracket operator, returning a pointer to the first element of a column. As a result, for a MatrixAccessor `ma`

, the i-th row and j-th column is accessed with `ma[j][i]`

rather than `m[i, j]`

, which R users are more familiar with.

The code above defines a function `BigColSums`

that takes as an argument the address of the external pointer associated with a `big.matrix`

object. The function starts by creating a `MatrixAccessor`

to provide direct access to the matrix elements. The `MatrixAccessor`

constructor takes the type of elements as a template parameter and a `BigMatrix`

object as function parameter. Along with the MatrixAccessor, a NumericVector is created to hold the return value. Next, the function loops through the columns. For each iteration of the loop the values in a single column are accumulated and stored at the appropriate location in the `colSum`

vector. Finally, the columns sums are returned to R.

The code below shows how to use the new Rcpp function. A `big.matrix`

object is created, named bigmat, with 10000 rows and 3 columns. Matrix elements are stored on disk in a “backingfile” named bigmat.bk. After creating the big.matrix object, the column values are filled in and then the `big.matrix`

’s external pointer, which is references with the `address`

slot, is passed to the Rcpp `BigColSums`

function. The corresponding R function is shown below so that you can verify that our new function returns the correct value.

suppressMessages(require(bigmemory)) # set up big.matrix nrows <- 10000 setwd("/tmp") bkFile <- "bigmat.bk" descFile <- "bigmatk.desc" bigmat <- filebacked.big.matrix(nrow=nrows, ncol=3, type="double", backingfile=bkFile, backingpath=".", descriptorfile=descFile, dimnames=c(NULL,NULL)) # Each column value with be the column number multiplied by # samples from a standard normal distribution. set.seed(123) for (i in 1:3) bigmat[,i] = rnorm(nrows)*i # Call the Rcpp function. res <- BigColSums([email protected]) print(res) [1] -23.72 -182.13 -212.98 # Verify that it is that same as running colSums on a matrix with equal values. print(all.equal(res, colSums(bigmat[,]))) [1] TRUE

**leave a comment**for the author, please follow the link and comment on their blog:

**Rcpp Gallery**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.