Matrix Package Doodling

November 24, 2011
By

(This article was first published on Adventures in Statistical Computing, and kindly contributed to R-bloggers)

Trying not to fall into Thanksgiving Day, football, coma.  So I started looking at the Matrix package.

Started out by changing my code from before to create a matrix using the Matrix() function from the Matrix package.
n = 4000
c = Matrix(.9,n,n)
for(i in 1:n){
   c[i,i] = 1;
}
That took... a LONG time. What the hell?  Reading the intro document, it looks like there are multiple internal classes that hold matrices.  I'm guessing that every time I update the matrix it tries to decide what class should be used.  That could make doing matrix operations, painful.

This however works MUCH faster -- probably because the matrix is only checked once on creation.
n = 4000
c = matrix(.9,n,n)
for(i in 1:n){
   c[i,i] = 1
}
c = Matrix(c)
What does the internal structure of this look like?
str(c)

Formal class 'dsyMatrix' [package "Matrix"] with 5 slots
  ..@ x       : num [1:16000000] 1 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 ...
  ..@ Dim     : int [1:2] 4000 4000
  ..@ Dimnames:List of 2
  .. ..$ : NULL
  .. ..$ : NULL
  ..@ uplo    : chr "U"
  ..@ factors : list()
The document referenced above lists a dsyMatrix as "Symmetric real matrices in non-packed storage." There is a type "dpoMatrix Positive semi-de nite symmetric real matrices in non-packed storage." Technically, this matrix is PSD. Why is it not a dpoMatrix?

Let's recalculate the Cholesky, look at the performance, and re-check the class:
gflops = function(ops,time){
   fps = ops/time
   return(fps/1000000000)
}

start = Sys.time()
root = chol(c)
end = Sys.time()
gflops(n^3/3,as.numeric(end - start))
[1] 1.584472
str(c)
Formal class 'dsyMatrix' [package "Matrix"] with 5 slots
  ..@ x       : num [1:16000000] 1 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 ...
  ..@ Dim     : int [1:2] 4000 4000
  ..@ Dimnames:List of 2
  .. ..$ : NULL
  .. ..$ : NULL
  ..@ uplo    : chr "U"
  ..@ factors :List of 1
  .. ..$ Cholesky:Formal class 'Cholesky' [package "Matrix"] with 5 slots
  .. .. .. ..@ x       : num [1:16000000] 1 0 0 0 0 0 0 0 0 0 ...
  .. .. .. ..@ Dim     : int [1:2] 4000 4000
  .. .. .. ..@ Dimnames:List of 2
  .. .. .. .. ..$ : NULL
  .. .. .. .. ..$ : NULL
  .. .. .. ..@ uplo    : chr "U"
  .. .. .. ..@ diag    : chr "N"
str(root)
Formal class 'Cholesky' [package "Matrix"] with 5 slots
  ..@ x       : num [1:16000000] 1 0 0 0 0 0 0 0 0 0 ...
  ..@ Dim     : int [1:2] 4000 4000
  ..@ Dimnames:List of 2
  .. ..$ : NULL
  .. ..$ : NULL
  ..@ uplo    : chr "U"
  ..@ diag    : chr "N"

Matrix class didn't change and the performance is worse than the base class.

Notice how the Cholesky root is stored on the c matrix? This means we can get back to it if needed. That's pretty cool.

Unfortunately, the chol() function in Matrix and base requires a positive definite (PD) matrix.  Often in finance, I only have positive semi-definite (PSD) matrices.  We'll talk about that next time.

To leave a comment for the author, please follow the link and comment on his blog: Adventures in Statistical Computing.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags:

Comments are closed.