# What is a dgCMatrix object made of? (sparse matrix format in R)

**R – Statistical Odds & Ends**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I’ve been working with sparse matrices in R recently (those created using `Matrix::Matrix`

with the option `sparse=TRUE`

) and found it difficult to track down documentation about what the slots in the matrix object are. This post describes the slots in a class `dgCMatrix`

object.

(Click here for full documentation of the `Matrix`

package (and it is a lot–like, 215 pages a lot).)

**Background**

It turns out that there is some documentation on `dgCMatrix`

objects within the `Matrix`

package. One can access it using the following code:

library(Matrix) ?`dgCMatrix-class`

According to the documentation, the `dgCMatrix`

class

…is a class of sparse numeric matrices in the compressed, sparse, column-oriented format. In this implementation the non-zero elements in the columns are sorted into increasing row order.

`dgCMatrix`

is the “standard” class for sparse numeric matrices in the`Matrix`

package.

**An example**

We’ll use a small matrix as a running example in this post:

library(Matrix) M <- Matrix(c(0, 0, 0, 2, 6, 0, -1, 5, 0, 4, 3, 0, 0, 0, 5, 0), byrow = TRUE, nrow = 4, sparse = TRUE) rownames(M) <- paste0("r", 1:4) colnames(M) <- paste0("c", 1:4) M # 4 x 4 sparse Matrix of class "dgCMatrix" # c1 c2 c3 c4 # r1 . . . 2 # r2 6 . -1 5 # r3 . 4 3 . # r4 . . 5 .

Running `str`

on `x`

tells us that the `dgCMatrix`

object has 6 slots. (To learn more about slots and S4 objects, see this section from Hadley Wickham’s *Advanced R*.)

str(M) # Formal class 'dgCMatrix' [package "Matrix"] with 6 slots # [email protected] i : int [1:7] 1 2 1 2 3 0 1 # [email protected] p : int [1:5] 0 1 2 5 7 # [email protected] Dim : int [1:2] 4 4 # [email protected] Dimnames:List of 2 # .. ..$ : chr [1:4] "r1" "r2" "r3" "r4" # .. ..$ : chr [1:4] "c1" "c2" "c3" "c4" # [email protected] x : num [1:7] 6 4 -1 3 5 2 5 # [email protected] factors : list()

`x`

, `i`

and `p`

If a matrix `M`

has `nn`

non-zero entries, then its `x`

slot is a vector of length `nn`

containing all the non-zero values in the matrix. The non-zero elements in column 1 are listed first (starting from the top and ending at the bottom), followed by column 2, 3 and so on.

M # 4 x 4 sparse Matrix of class "dgCMatrix" # c1 c2 c3 c4 # r1 . . . 2 # r2 6 . -1 5 # r3 . 4 3 . # r4 . . 5 . [email protected] # [1] 6 4 -1 3 5 2 5 as.numeric(M)[as.numeric(M) != 0] # [1] 6 4 -1 3 5 2 5

The `i`

slot is a vector of length `nn`

. The `k`

th element of `[email protected]`

is the row index of the `k`

th non-zero element (as listed in `[email protected]`

). * One big thing to note here is that the first row has index ZERO, unlike R’s indexing convention.* In our example, the first non-zero entry, 6, is in the second row, i.e. row index 1, so the first entry of

`[email protected]`

is 1.M # 4 x 4 sparse Matrix of class "dgCMatrix" # c1 c2 c3 c4 # r1 . . . 2 # r2 6 . -1 5 # r3 . 4 3 . # r4 . . 5 . [email protected] # [1] 1 2 1 2 3 0 1

If the matrix has `nvars`

columns, then the `p`

slot is a vector of length `nvars + 1`

. * If we index the columns such that the first column has index ZERO,* then

`[email protected][1] = 0`

, and `[email protected][j+2] - [email protected][j+1]`

gives us the number of non-zero elements in column `j`

.In our example, when `j = 2`

, `[email protected][2+2] - [email protected][2+1] = 5 - 2 = 3`

, so there are 3 non-zero elements in column index 2 (i.e. the third column).

M # 4 x 4 sparse Matrix of class "dgCMatrix" # c1 c2 c3 c4 # r1 . . . 2 # r2 6 . -1 5 # r3 . 4 3 . # r4 . . 5 . [email protected] # [1] 0 1 2 5 7

With the `x`

, `i`

and `p`

slots, one can reconstruct the entries of the matrix.

`Dim`

and `Dimnames`

These two slots are fairly obvious. `Dim`

is a vector of length 2, with the first and second entries denoting the number of rows and columns the matrix has respectively. `Dimnames`

is a list of length 2: the first element being a vector of row names (if present) and the second being a vector of column names (if present).

`factors`

This slot is probably the most unusual of the lot, and its documentation was a bit difficult to track down. From the CRAN documentation, it looks like `factors`

is

… [an] Object of class “list” – a list of factorizations of the matrix. Note that this is typically empty, i.e.,

`list()`

, initially and is updatedwhenever a matrix factorization isautomagically

computed.

My understanding is if we perform any matrix factorizations or decompositions on a `dgCMatrix`

object, it stores the factorization under `factors`

so that if asked for the factorization again, it can return the cached value instead of recomputing the factorization. Here is an example:

[email protected] # list() Mlu <- lu(M) # perform triangular decomposition str([email protected]) # List of 1 # $ LU:Formal class 'sparseLU' [package "Matrix"] with 5 slots # .. [email protected] L :Formal class 'dtCMatrix' [package "Matrix"] with 7 slots # .. .. .. [email protected] i : int [1:4] 0 1 2 3 # .. .. .. [email protected] p : int [1:5] 0 1 2 3 4 # .. .. .. [email protected] Dim : int [1:2] 4 4 # .. .. .. [email protected] Dimnames:List of 2 # .. .. .. .. ..$ : chr [1:4] "r2" "r3" "r4" "r1" # .. .. .. .. ..$ : NULL # .. .. .. [email protected] x : num [1:4] 1 1 1 1 # .. .. .. [email protected] uplo : chr "U" # .. .. .. [email protected] diag : chr "N" # .. [email protected] U :Formal class 'dtCMatrix' [package "Matrix"] with 7 slots # .. .. .. [email protected] i : int [1:7] 0 1 0 1 2 0 3 # .. .. .. [email protected] p : int [1:5] 0 1 2 5 7 # .. .. .. [email protected] Dim : int [1:2] 4 4 # .. .. .. [email protected] Dimnames:List of 2 # .. .. .. .. ..$ : NULL # .. .. .. .. ..$ : chr [1:4] "c1" "c2" "c3" "c4" # .. .. .. [email protected] x : num [1:7] 6 4 -1 3 5 5 2 # .. .. .. [email protected] uplo : chr "U" # .. .. .. [email protected] diag : chr "N" # .. [email protected] p : int [1:4] 1 2 3 0 # .. [email protected] q : int [1:4] 0 1 2 3 # .. [email protected] Dim: int [1:2] 4 4

Here is an example which shows that the decomposition is only performed once:

set.seed(1) M <- runif(9e6) M[sample.int(9e6, size = 8e6)] <- 0 M <- Matrix(M, nrow = 3e3, sparse = TRUE) system.time(lu(M)) # user system elapsed # 13.527 0.161 13.701 system.time(lu(M)) # user system elapsed # 0 0 0

**leave a comment**for the author, please follow the link and comment on their blog:

**R – Statistical Odds & Ends**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.