Coercion of matrix to sparse matrix (dgCMatrix) and maintaining dimnames.
[This article was first published on Rcpp Gallery, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Consider the following matrix
nr <- nc <- 6 set.seed <- 123 m <- matrix(sample(c(rep(0,9), 1),nr*nc, replace=T), nrow=nr, ncol=nc) sum(m)/length(m) [1] 0.1667 dimnames(m) <- list(letters[1:nr], letters[1:nc]) m a b c d e f a 0 0 0 0 0 1 b 0 0 0 1 0 1 c 0 0 0 0 0 0 d 0 0 0 0 0 0 e 1 1 0 0 0 0 f 0 0 0 1 0 0
This matrix can be coerced to a sparse matrix with
library("Matrix")
Loading required package: methods
Loading required package: lattice
M1 <- as(m, "dgCMatrix")
M1
6 x 6 sparse Matrix of class "dgCMatrix"
a b c d e f
a . . . . . 1
b . . . 1 . 1
c . . . . . .
d . . . . . .
e 1 1 . . . .
f . . . 1 . .
str(M1)
Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
..@ i : int [1:6] 4 4 1 5 0 1
..@ p : int [1:7] 0 1 2 2 4 4 6
..@ Dim : int [1:2] 6 6
..@ Dimnames:List of 2
.. ..$ : chr [1:6] "a" "b" "c" "d" ...
.. ..$ : chr [1:6] "a" "b" "c" "d" ...
..@ x : num [1:6] 1 1 1 1 1 1
..@ factors : list()
Using Eigen via RcppEigen we can obtain the coercion as:
// [[Rcpp::depends(RcppEigen)]]
#include <RcppEigen.h>
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
SEXP asdgCMatrix_( SEXP XX_ ){
typedef Eigen::SparseMatrix<double> SpMat;
typedef Eigen::Map<Eigen::MatrixXd> MapMatd; // Input: must be double
MapMatd X(Rcpp::as<MapMatd>(XX_));
SpMat Xsparse = X.sparseView(); // Output: sparse matrix
S4 Xout(wrap(Xsparse)); // Output: as S4 object
NumericMatrix Xin(XX_); // Copy dimnames
Xout.slot("Dimnames") = clone(List(Xin.attr("dimnames")));
return(Xout);
}
(M2 <- asdgCMatrix_(m * 1.0))
6 x 6 sparse Matrix of class "dgCMatrix"
a b c d e f
a . . . . . 1
b . . . 1 . 1
c . . . . . .
d . . . . . .
e 1 1 . . . .
f . . . 1 . .
str(M2)
Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
..@ i : int [1:6] 4 4 1 5 0 1
..@ p : int [1:7] 0 1 2 2 4 4 6
..@ Dim : int [1:2] 6 6
..@ Dimnames:List of 2
.. ..$ : chr [1:6] "a" "b" "c" "d" ...
.. ..$ : chr [1:6] "a" "b" "c" "d" ...
..@ x : num [1:6] 1 1 1 1 1 1
..@ factors : list()
identical(M1, M2)
[1] TRUE
Compare the performance:
cols <- c("test", "replications", "elapsed", "relative", "user.self", "sys.self")
rbenchmark::benchmark(asdgCMatrix_(m * 1.0), as(m, "dgCMatrix"),
columns=cols, order="relative", replications=1000)
test replications elapsed relative user.self sys.self
1 asdgCMatrix_(m * 1) 1000 0.025 1.00 0.024 0.000
2 as(m, "dgCMatrix") 1000 0.246 9.84 0.240 0.004
For larger matrices the difference in performance gain is smaller:
## 100 x 100 matrix
nr <- nc <- 100
set.seed <- 123
m <- matrix(sample(c(rep(0,9), 1),nr*nc, replace=T), nrow=nr, ncol=nc)
rbenchmark::benchmark(asdgCMatrix_(m * 1.0), as(m, "dgCMatrix"),
columns=cols, order="relative", replications=1000)
test replications elapsed relative user.self sys.self
1 asdgCMatrix_(m * 1) 1000 0.137 1.000 0.136 0.000
2 as(m, "dgCMatrix") 1000 0.443 3.234 0.432 0.008
## 1000 x 1000 matrix
nr <- nc <- 1000
set.seed <- 123
m <- matrix(sample(c(rep(0,9), 1),nr*nc, replace=T), nrow=nr, ncol=nc)
rbenchmark::benchmark(asdgCMatrix_(m * 1.0), as(m, "dgCMatrix"),
columns=cols, order="relative", replications=100)
test replications elapsed relative user.self sys.self
1 asdgCMatrix_(m * 1) 100 1.193 1.000 1.180 0.008
2 as(m, "dgCMatrix") 100 2.201 1.845 2.064 0.128
## 3000 x 3000 matrix
nr <- nc <- 3000
set.seed <- 123
m <- matrix(sample(c(rep(0,9), 1),nr*nc, replace=T), nrow=nr, ncol=nc)
rbenchmark::benchmark(asdgCMatrix_(m * 1.0), as(m, "dgCMatrix"),
columns=cols, order="relative", replications=100)
test replications elapsed relative user.self sys.self
1 asdgCMatrix_(m * 1) 100 8.911 1.000 6.024 2.828
2 as(m, "dgCMatrix") 100 21.557 2.419 16.930 4.500
Thanks to Doug Bates for illustrating to me how set the dimnames attribute.
To leave a comment for the author, please follow the link and comment on their blog: Rcpp Gallery.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.