[This article was first published on R – Statistical Odds & Ends, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I’ve been working with sparse matrices in R recently (those created using Matrix::Matrix with the option sparse=TRUE) and found it difficult to track down documentation about what the slots in the matrix object are. This post describes the slots in a class dgCMatrix object.

(Click here for full documentation of the Matrix package (and it is a lot–like, 215 pages a lot).)

Background

It turns out that there is some documentation on dgCMatrix objects within the Matrix package. One can access it using the following code:

library(Matrix)
?dgCMatrix-class


According to the documentation, the dgCMatrix class

…is a class of sparse numeric matrices in the compressed, sparse, column-oriented format. In this implementation the non-zero elements in the columns are sorted into increasing row order. dgCMatrix is the “standard” class for sparse numeric matrices in the Matrix package.

An example

We’ll use a small matrix as a running example in this post:

library(Matrix)
M <- Matrix(c(0, 0,  0, 2,
6, 0, -1, 5,
0, 4,  3, 0,
0, 0,  5, 0),
byrow = TRUE, nrow = 4, sparse = TRUE)
rownames(M) <- paste0("r", 1:4)
colnames(M) <- paste0("c", 1:4)
M
# 4 x 4 sparse Matrix of class "dgCMatrix"
#    c1 c2 c3 c4
# r1  .  .  .  2
# r2  6  . -1  5
# r3  .  4  3  .
# r4  .  .  5  .


Running str on x tells us that the dgCMatrix object has 6 slots. (To learn more about slots and S4 objects, see this section from Hadley Wickham’s Advanced R.)

str(M)
# Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
# [email protected] i       : int [1:7] 1 2 1 2 3 0 1
# [email protected] p       : int [1:5] 0 1 2 5 7
# [email protected] Dim     : int [1:2] 4 4
# [email protected] Dimnames:List of 2
# .. ..$: chr [1:4] "r1" "r2" "r3" "r4" # .. ..$ : chr [1:4] "c1" "c2" "c3" "c4"
# [email protected] x       : num [1:7] 6 4 -1 3 5 2 5
# [email protected] factors : list()


x, i and p

If a matrix M has nn non-zero entries, then its x slot is a vector of length nn containing all the non-zero values in the matrix. The non-zero elements in column 1 are listed first (starting from the top and ending at the bottom), followed by column 2, 3 and so on.

M
# 4 x 4 sparse Matrix of class "dgCMatrix"
#    c1 c2 c3 c4
# r1  .  .  .  2
# r2  6  . -1  5
# r3  .  4  3  .
# r4  .  .  5  .

[email protected]
# [1]  6  4 -1  3  5  2  5

as.numeric(M)[as.numeric(M) != 0]
# [1]  6  4 -1  3  5  2  5


The i slot is a vector of length nn. The kth element of [email protected] is the row index of the kth non-zero element (as listed in [email protected]). One big thing to note here is that the first row has index ZERO, unlike R’s indexing convention. In our example, the first non-zero entry, 6, is in the second row, i.e. row index 1, so the first entry of [email protected] is 1.

M
# 4 x 4 sparse Matrix of class "dgCMatrix"
#    c1 c2 c3 c4
# r1  .  .  .  2
# r2  6  . -1  5
# r3  .  4  3  .
# r4  .  .  5  .

[email protected]
# [1] 1 2 1 2 3 0 1


If the matrix has nvars columns, then the p slot is a vector of length nvars + 1. If we index the columns such that the first column has index ZERO, then [email protected][1] = 0, and [email protected][j+2] - [email protected][j+1] gives us the number of non-zero elements in column j.

In our example, when j = 2, [email protected][2+2] - [email protected][2+1] = 5 - 2 = 3, so there are 3 non-zero elements in column index 2 (i.e. the third column).

M
# 4 x 4 sparse Matrix of class "dgCMatrix"
#    c1 c2 c3 c4
# r1  .  .  .  2
# r2  6  . -1  5
# r3  .  4  3  .
# r4  .  .  5  .

[email protected]
# [1] 0 1 2 5 7


With the x, i and p slots, one can reconstruct the entries of the matrix.

Dim and Dimnames

These two slots are fairly obvious. Dim is a vector of length 2, with the first and second entries denoting the number of rows and columns the matrix has respectively. Dimnames is a list of length 2: the first element being a vector of row names (if present) and the second being a vector of column names (if present).

factors

This slot is probably the most unusual of the lot, and its documentation was a bit difficult to track down. From the CRAN documentation, it looks like factors is

… [an] Object of class “list” – a list of factorizations of the matrix. Note that this is typically empty, i.e., list(), initially and is updated automagically whenever a matrix factorization is
computed.

My understanding is if we perform any matrix factorizations or decompositions on a dgCMatrix object, it stores the factorization under factors so that if asked for the factorization again, it can return the cached value instead of recomputing the factorization. Here is an example:

[email protected]
# list()

Mlu <- lu(M)  # perform triangular decomposition
str([email protected])
# List of 1
# $LU:Formal class 'sparseLU' [package "Matrix"] with 5 slots # .. [email protected] L :Formal class 'dtCMatrix' [package "Matrix"] with 7 slots # .. .. .. [email protected] i : int [1:4] 0 1 2 3 # .. .. .. [email protected] p : int [1:5] 0 1 2 3 4 # .. .. .. [email protected] Dim : int [1:2] 4 4 # .. .. .. [email protected] Dimnames:List of 2 # .. .. .. .. ..$ : chr [1:4] "r2" "r3" "r4" "r1"
# .. .. .. .. ..$: NULL # .. .. .. [email protected] x : num [1:4] 1 1 1 1 # .. .. .. [email protected] uplo : chr "U" # .. .. .. [email protected] diag : chr "N" # .. [email protected] U :Formal class 'dtCMatrix' [package "Matrix"] with 7 slots # .. .. .. [email protected] i : int [1:7] 0 1 0 1 2 0 3 # .. .. .. [email protected] p : int [1:5] 0 1 2 5 7 # .. .. .. [email protected] Dim : int [1:2] 4 4 # .. .. .. [email protected] Dimnames:List of 2 # .. .. .. .. ..$ : NULL
# .. .. .. .. ..\$ : chr [1:4] "c1" "c2" "c3" "c4"
# .. .. .. [email protected] x       : num [1:7] 6 4 -1 3 5 5 2
# .. .. .. [email protected] uplo    : chr "U"
# .. .. .. [email protected] diag    : chr "N"
# .. [email protected] p  : int [1:4] 1 2 3 0
# .. [email protected] q  : int [1:4] 0 1 2 3
# .. [email protected] Dim: int [1:2] 4 4


Here is an example which shows that the decomposition is only performed once:

set.seed(1)
M <- runif(9e6)
M[sample.int(9e6, size = 8e6)] <- 0
M <- Matrix(M, nrow = 3e3, sparse = TRUE)

system.time(lu(M))
#   user  system elapsed
# 13.527   0.161  13.701

system.time(lu(M))
#   user  system elapsed
#      0       0       0