**Adventures in Statistical Computing**, and kindly contributed to R-bloggers)

Trying not to fall into Thanksgiving Day, football, coma. So I started looking at the Matrix package.

Started out by changing my code from before to create a matrix using the Matrix() function from the Matrix package.

n = 4000

c = Matrix(.9,n,n)

for(i in 1:n){

c[i,i] = 1;

}

That took… a LONG time. What the hell? Reading the intro document, it looks like there are multiple internal classes that hold matrices. I’m guessing that every time I update the matrix it tries to decide what class should be used. That could make doing matrix operations, painful.

This however works MUCH faster — probably because the matrix is only checked once on creation.

n = 4000

c = matrix(.9,n,n)

for(i in 1:n){

c[i,i] = 1

}

c = Matrix(c)

What does the internal structure of this look like?

str(c)

Formal class ‘dsyMatrix’ [package “Matrix”] with 5 slots

[email protected] x : num [1:16000000] 1 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 …

[email protected] Dim : int [1:2] 4000 4000

[email protected] Dimnames:List of 2

.. ..$ : NULL

.. ..$ : NULL

[email protected] uplo : chr “U”

[email protected] factors : list()

The document referenced above lists a dsyMatrix as “Symmetric real matrices in non-packed storage.” There is a type “**dpoMatrix** Positive semi-de nite symmetric real matrices in non-packed storage.” Technically, this matrix is PSD. Why is it not a **dpoMatrix**?

Let’s recalculate the Cholesky, look at the performance, and re-check the class:

gflops = function(ops,time){

fps = ops/time

return(fps/1000000000)

}start = Sys.time()

root = chol(c)

end = Sys.time()

gflops(n^3/3,as.numeric(end – start))

[1] 1.584472

str(c)

Formal class ‘dsyMatrix’ [package “Matrix”] with 5 slots

[email protected] x : num [1:16000000] 1 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 …

[email protected] Dim : int [1:2] 4000 4000

[email protected] Dimnames:List of 2

.. ..$ : NULL

.. ..$ : NULL

[email protected] uplo : chr “U”

[email protected] factors :List of 1

.. ..$ Cholesky:Formal class ‘Cholesky’ [package “Matrix”] with 5 slots

.. .. .. [email protected] x : num [1:16000000] 1 0 0 0 0 0 0 0 0 0 …

.. .. .. [email protected] Dim : int [1:2] 4000 4000

.. .. .. [email protected] Dimnames:List of 2

.. .. .. .. ..$ : NULL

.. .. .. .. ..$ : NULL

.. .. .. [email protected] uplo : chr “U”

.. .. .. [email protected] diag : chr “N”

str(root)

Formal class ‘Cholesky’ [package “Matrix”] with 5 slots

[email protected] x : num [1:16000000] 1 0 0 0 0 0 0 0 0 0 …

[email protected] Dim : int [1:2] 4000 4000

[email protected] Dimnames:List of 2

.. ..$ : NULL

.. ..$ : NULL

[email protected] uplo : chr “U”

[email protected] diag : chr “N”

Matrix class didn’t change and the performance is worse than the base class.

Notice how the Cholesky root is stored on the **c** matrix? This means we can get back to it if needed. That’s pretty cool.

Unfortunately, the chol() function in Matrix and base requires a positive definite (PD) matrix. Often in finance, I only have positive semi-definite (PSD) matrices. We’ll talk about that next time.

**leave a comment**for the author, please follow the link and comment on their blog:

**Adventures in Statistical Computing**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...