(This article was first published on Rcpp Gallery, and kindly contributed to R-bloggers)
Consider the following matrix
nr <- nc <- 6
set.seed <- 123
m <- matrix(sample(c(rep(0,9), 1),nr*nc, replace=T), nrow=nr, ncol=nc)
sum(m)/length(m)
[1] 0.1667
dimnames(m) <- list(letters[1:nr], letters[1:nc])
m
a b c d e f a 0 0 0 0 0 1 b 0 0 0 1 0 1 c 0 0 0 0 0 0 d 0 0 0 0 0 0 e 1 1 0 0 0 0 f 0 0 0 1 0 0
This matrix can be coerced to a sparse matrix with
library("Matrix")
Loading required package: methods
Loading required package: lattice
M1 <- as(m, "dgCMatrix")
M1
6 x 6 sparse Matrix of class "dgCMatrix" a b c d e f a . . . . . 1 b . . . 1 . 1 c . . . . . . d . . . . . . e 1 1 . . . . f . . . 1 . .
str(M1)
Formal class 'dgCMatrix' [package "Matrix"] with 6 slots ..@ i : int [1:6] 4 4 1 5 0 1 ..@ p : int [1:7] 0 1 2 2 4 4 6 ..@ Dim : int [1:2] 6 6 ..@ Dimnames:List of 2 .. ..$ : chr [1:6] "a" "b" "c" "d" ... .. ..$ : chr [1:6] "a" "b" "c" "d" ... ..@ x : num [1:6] 1 1 1 1 1 1 ..@ factors : list()
Using Eigen via RcppEigen we can obtain the coercion as:
// [[Rcpp::depends(RcppEigen)]]
#include <RcppEigen.h>
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
SEXP asdgCMatrix_( SEXP XX_ ){
typedef Eigen::SparseMatrix<double> SpMat;
typedef Eigen::Map<Eigen::MatrixXd> MapMatd; // Input: must be double
MapMatd X(Rcpp::as<MapMatd>(XX_));
SpMat Xsparse = X.sparseView(); // Output: sparse matrix
S4 Xout(wrap(Xsparse)); // Output: as S4 object
NumericMatrix Xin(XX_); // Copy dimnames
Xout.slot("Dimnames") = clone(List(Xin.attr("dimnames")));
return(Xout);
}
(M2 <- asdgCMatrix_(m * 1.0))
6 x 6 sparse Matrix of class "dgCMatrix" a b c d e f a . . . . . 1 b . . . 1 . 1 c . . . . . . d . . . . . . e 1 1 . . . . f . . . 1 . .
str(M2)
Formal class 'dgCMatrix' [package "Matrix"] with 6 slots ..@ i : int [1:6] 4 4 1 5 0 1 ..@ p : int [1:7] 0 1 2 2 4 4 6 ..@ Dim : int [1:2] 6 6 ..@ Dimnames:List of 2 .. ..$ : chr [1:6] "a" "b" "c" "d" ... .. ..$ : chr [1:6] "a" "b" "c" "d" ... ..@ x : num [1:6] 1 1 1 1 1 1 ..@ factors : list()
identical(M1, M2)
[1] TRUE
Compare the performance:
cols <- c("test", "replications", "elapsed", "relative", "user.self", "sys.self")
rbenchmark::benchmark(asdgCMatrix_(m * 1.0), as(m, "dgCMatrix"),
columns=cols, order="relative", replications=1000)
test replications elapsed relative user.self sys.self
1 asdgCMatrix_(m * 1) 1000 0.025 1.00 0.024 0.000
2 as(m, "dgCMatrix") 1000 0.246 9.84 0.240 0.004
For larger matrices the difference in performance gain is smaller:
## 100 x 100 matrix
nr <- nc <- 100
set.seed <- 123
m <- matrix(sample(c(rep(0,9), 1),nr*nc, replace=T), nrow=nr, ncol=nc)
rbenchmark::benchmark(asdgCMatrix_(m * 1.0), as(m, "dgCMatrix"),
columns=cols, order="relative", replications=1000)
test replications elapsed relative user.self sys.self
1 asdgCMatrix_(m * 1) 1000 0.137 1.000 0.136 0.000
2 as(m, "dgCMatrix") 1000 0.443 3.234 0.432 0.008
## 1000 x 1000 matrix
nr <- nc <- 1000
set.seed <- 123
m <- matrix(sample(c(rep(0,9), 1),nr*nc, replace=T), nrow=nr, ncol=nc)
rbenchmark::benchmark(asdgCMatrix_(m * 1.0), as(m, "dgCMatrix"),
columns=cols, order="relative", replications=100)
test replications elapsed relative user.self sys.self
1 asdgCMatrix_(m * 1) 100 1.193 1.000 1.180 0.008
2 as(m, "dgCMatrix") 100 2.201 1.845 2.064 0.128
## 3000 x 3000 matrix
nr <- nc <- 3000
set.seed <- 123
m <- matrix(sample(c(rep(0,9), 1),nr*nc, replace=T), nrow=nr, ncol=nc)
rbenchmark::benchmark(asdgCMatrix_(m * 1.0), as(m, "dgCMatrix"),
columns=cols, order="relative", replications=100)
test replications elapsed relative user.self sys.self
1 asdgCMatrix_(m * 1) 100 8.911 1.000 6.024 2.828
2 as(m, "dgCMatrix") 100 21.557 2.419 16.930 4.500
Thanks to Doug Bates for illustrating to me how set the dimnames attribute.
To leave a comment for the author, please follow the link and comment on his blog: Rcpp Gallery.
R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series,ecdf, trading) and more...

Zero Inflated Models and Generalized Linear Mixed Models with R.
Zuur, Saveliev, Ieno (2012).