**Adventures in Statistical Computing**, and kindly contributed to R-bloggers)

I’ve been working on an example of the new Graph Template Language from SAS. As I don’t have direct access to SAS 9.2, I’ve been developing via email with a friend that does.

In the meantime, I thought I would start to investigate some of the performance properties of R. I work in the financial risk industry and I often find myself worrying about performance of linear algebra systems. Often I find myself setting up a simulation environment and 2 things dominate the performance — the Cholesky root of the covariance matrix and simple matrix multiplication.

The Cholesky root (L) of a matrix (∑) is a decomposition scheme where

n = 4000

corr = matrix(.9,n,n)

for(i in 1:n){corr[i,i] = 1; };

start = Sys.time()

root = chol(corr)

end = Sys.time()

end – start

root[1:5,1:5]

That gives us the return values

Time difference of 12.383 secs

[,1] [,2] [,3] [,4] [,5]

[1,] 1 0.9000000 0.9000000 0.9000000 0.90000000

[2,] 0 0.4358899 0.2064742 0.2064742 0.20647416

[3,] 0 0.0000000 0.3838859 0.1233919 0.12339191

[4,] 0 0.0000000 0.0000000 0.3635146 0.08842247

[5,] 0 0.0000000 0.0000000 0.0000000 0.35259655

Unfortunately, this is the L` matrix, and we need the L matrix. So we can transpose the result with the t() function.

start = Sys.time()

root = t(chol(corr))

end = Sys.time()

end – start

Time difference of 12.42 secs

So the transpose function is quick. How many gigaflops is that? Let’s create a function for that.

gflops = function(ops,time){

fps = ops/time

return(fps/1000000000)}

It takes approximately N^3 / 3 operations to calculate a Cholesky. So…

gflops(n^3/3,as.numeric(end-start))

[1] 1.71766

I’m running on a fairly new Intel CORE i5 laptop. That’s pretty good. Note that we have to convert the time difference to a numeric value.

So for matrix multiplication, we will deal with square matrices. For that reason, if A and B are square (n x n) matrices, then A*B has 2*n^3 – n^2 operations. Let’s create 2 random 4000 x 4000 matrices, multiply them together and see what the time and gflops are.

a = matrix(rnorm(n^2),n,n)

b = matrix(rnorm(n^2),n,n)

start = Sys.time()

xx = a %*% b

end = Sys.time()

end – start

gflops(2*4000^3 – 4000^2,as.numeric(end-start))

Time difference of 1.034383 mins

[1] 123.7298

That seems a tad high. What does as.numeric() return if the time difference is in minutes?

as.numeric(end – start)

[1] 1.034383

That’s annoying. If the time difference is > 1 minute, then you get the time in minutes, not seconds. Really annoying. Dividing 123.7298 / 60 = 2.0622.

Overall not too bad. There is a package called Matrix that extends the base matrix functionality. I haven’t looked into that yet. Next post I will explore that package and see if we can find ways to speed up these benchmarks.

**leave a comment**for the author, please follow the link and comment on their blog:

**Adventures in Statistical Computing**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...