Coercion of matrix to sparse matrix (dgCMatrix) and maintaining dimnames.

January 20, 2013
By

(This article was first published on Rcpp Gallery, and kindly contributed to R-bloggers)

Consider the following matrix

nr <- nc <- 6
set.seed <- 123
m  <- matrix(sample(c(rep(0,9), 1),nr*nc, replace=T), nrow=nr, ncol=nc)
sum(m)/length(m)
[1] 0.1667
dimnames(m) <- list(letters[1:nr], letters[1:nc])
m
  a b c d e f
a 0 0 0 0 0 1
b 0 0 0 1 0 1
c 0 0 0 0 0 0
d 0 0 0 0 0 0
e 1 1 0 0 0 0
f 0 0 0 1 0 0

This matrix can be coerced to a sparse matrix with

library("Matrix")
Loading required package: methods
Loading required package: lattice
M1 <- as(m, "dgCMatrix")
M1 
6 x 6 sparse Matrix of class "dgCMatrix"
  a b c d e f
a . . . . . 1
b . . . 1 . 1
c . . . . . .
d . . . . . .
e 1 1 . . . .
f . . . 1 . .
str(M1)
Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
  ..@ i       : int [1:6] 4 4 1 5 0 1
  ..@ p       : int [1:7] 0 1 2 2 4 4 6
  ..@ Dim     : int [1:2] 6 6
  ..@ Dimnames:List of 2
  .. ..$ : chr [1:6] "a" "b" "c" "d" ...
  .. ..$ : chr [1:6] "a" "b" "c" "d" ...
  ..@ x       : num [1:6] 1 1 1 1 1 1
  ..@ factors : list()

Using Eigen via RcppEigen we can obtain the coercion as:

// [[Rcpp::depends(RcppEigen)]]

#include <RcppEigen.h>
#include <Rcpp.h>

using namespace Rcpp;
// [[Rcpp::export]]
SEXP asdgCMatrix_( SEXP XX_ ){
  typedef Eigen::SparseMatrix<double> SpMat;   
  typedef Eigen::Map<Eigen::MatrixXd> MapMatd; // Input: must be double
  MapMatd X(Rcpp::as<MapMatd>(XX_));
  SpMat Xsparse = X.sparseView();              // Output: sparse matrix
  S4 Xout(wrap(Xsparse));                      // Output: as S4 object
  NumericMatrix Xin(XX_);                      // Copy dimnames
  Xout.slot("Dimnames") = clone(List(Xin.attr("dimnames")));
  return(Xout);
}
(M2 <- asdgCMatrix_(m * 1.0))
6 x 6 sparse Matrix of class "dgCMatrix"
  a b c d e f
a . . . . . 1
b . . . 1 . 1
c . . . . . .
d . . . . . .
e 1 1 . . . .
f . . . 1 . .
str(M2)
Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
  ..@ i       : int [1:6] 4 4 1 5 0 1
  ..@ p       : int [1:7] 0 1 2 2 4 4 6
  ..@ Dim     : int [1:2] 6 6
  ..@ Dimnames:List of 2
  .. ..$ : chr [1:6] "a" "b" "c" "d" ...
  .. ..$ : chr [1:6] "a" "b" "c" "d" ...
  ..@ x       : num [1:6] 1 1 1 1 1 1
  ..@ factors : list()
identical(M1, M2)
[1] TRUE

Compare the performance:

cols <- c("test", "replications", "elapsed", "relative", "user.self", "sys.self")	
rbenchmark::benchmark(asdgCMatrix_(m * 1.0), as(m, "dgCMatrix"),	
                      columns=cols, order="relative", replications=1000)
                 test replications elapsed relative user.self sys.self
1 asdgCMatrix_(m * 1)         1000   0.025     1.00     0.024    0.000
2  as(m, "dgCMatrix")         1000   0.246     9.84     0.240    0.004

For larger matrices the difference in performance gain is smaller:

## 100 x 100 matrix
nr <- nc <- 100
set.seed <- 123
m  <- matrix(sample(c(rep(0,9), 1),nr*nc, replace=T), nrow=nr, ncol=nc)
rbenchmark::benchmark(asdgCMatrix_(m * 1.0), as(m, "dgCMatrix"),	
                      columns=cols, order="relative", replications=1000)
                 test replications elapsed relative user.self sys.self
1 asdgCMatrix_(m * 1)         1000   0.137    1.000     0.136    0.000
2  as(m, "dgCMatrix")         1000   0.443    3.234     0.432    0.008
## 1000 x 1000 matrix
nr <- nc <- 1000
set.seed <- 123
m  <- matrix(sample(c(rep(0,9), 1),nr*nc, replace=T), nrow=nr, ncol=nc)
rbenchmark::benchmark(asdgCMatrix_(m * 1.0), as(m, "dgCMatrix"),	
                      columns=cols, order="relative", replications=100)
                 test replications elapsed relative user.self sys.self
1 asdgCMatrix_(m * 1)          100   1.193    1.000     1.180    0.008
2  as(m, "dgCMatrix")          100   2.201    1.845     2.064    0.128
## 3000 x 3000 matrix
nr <- nc <- 3000
set.seed <- 123
m  <- matrix(sample(c(rep(0,9), 1),nr*nc, replace=T), nrow=nr, ncol=nc)
rbenchmark::benchmark(asdgCMatrix_(m * 1.0), as(m, "dgCMatrix"),	
                      columns=cols, order="relative", replications=100)
                 test replications elapsed relative user.self sys.self
1 asdgCMatrix_(m * 1)          100   8.911    1.000     6.024    2.828
2  as(m, "dgCMatrix")          100  21.557    2.419    16.930    4.500

Thanks to Doug Bates for illustrating to me how set the dimnames attribute.

To leave a comment for the author, please follow the link and comment on his blog: Rcpp Gallery.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.