# Drowning in a glass of water: variance-covariance and correlation matrices

**R on The broken bridge between biologists and statisticians**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

One of the easiest tasks in R is to get correlations between each pair of variables in a dataset. As an example, let’s take the first four columns in the ‘mtcars’ dataset, that is available within R. Getting the variances-covariances and the correlations is straightforward.

data(mtcars) matr <- mtcars[,1:4] #Covariances cov(matr)

## mpg cyl disp hp ## mpg 36.324103 -9.172379 -633.0972 -320.7321 ## cyl -9.172379 3.189516 199.6603 101.9315 ## disp -633.097208 199.660282 15360.7998 6721.1587 ## hp -320.732056 101.931452 6721.1587 4700.8669

#Correlations cor(matr)

## mpg cyl disp hp ## mpg 1.0000000 -0.8521620 -0.8475514 -0.7761684 ## cyl -0.8521620 1.0000000 0.9020329 0.8324475 ## disp -0.8475514 0.9020329 1.0000000 0.7909486 ## hp -0.7761684 0.8324475 0.7909486 1.0000000

It’s really a piece of cake! Unfortunately, a few days ago I had a covariance matrix without the original dataset and I wanted the corresponding correlation matrix. Although this is an easy task as well, at first I was stuck, because I could not find an immediate solution… So I started wondering how I could make it.

Indeed, having the two variables X and Y, their covariance is:

\[cov(X, Y) = \sum\limits_{i=1}^{n} {(X_i – \hat{X})(Y_i – \hat{Y})}\]

where \(\hat{Y}\) and \(\hat{X}\) are the means for each variable. The correlation is:

\[cor(X, Y) = \frac{cov(X, Y)}{\sigma_x \sigma_y} \]

where \(\sigma_x\) and \(\sigma_y\) are the standard deviations for X and Y.

The opposite relationship is clear:

\[ cov(X, Y) = cor(X, Y) \sigma_x \sigma_y\]

Therefore, converting from covariance to correlation is pretty easy. For example, take the covariance between ‘cyl’ and ‘mpg’ above (-9.172379), the correlation is:

-633.097208 / (sqrt(36.324103) * sqrt(15360.7998))

## [1] -0.8475514

On the reverse, if we have the correlation (-0.8521620), the covariance is

-0.8475514 * sqrt(36.324103) * sqrt(15360.7998)

## [1] -633.0972

My covariance matrix was pretty large, so I started wondering how I could perform this task altogether. What I had to do was to take each element in the covariance matrix and divide it by the square root of the diagonal elements in the same column and in the same row (see below).

This is easily done by matrix multiplication. I need a square matrix where the standard deviations for each variable are repeated along the rows:

V <- cov(matr) SM1 <- matrix(rep(sqrt(diag(V)), 4), 4, 4) SM1

## [,1] [,2] [,3] [,4] ## [1,] 6.026948 6.026948 6.026948 6.026948 ## [2,] 1.785922 1.785922 1.785922 1.785922 ## [3,] 123.938694 123.938694 123.938694 123.938694 ## [4,] 68.562868 68.562868 68.562868 68.562868

and another one where they are repeated along the columns

SM2 <- matrix(rep(sqrt(diag(V)), each = 4), 4, 4)

Now I can take my covariance matrix (V) and simply multiply these three matrices as follows:

V * 1/SM1 * 1/SM2

## mpg cyl disp hp ## mpg 1.0000000 -0.8521620 -0.8475514 -0.7761684 ## cyl -0.8521620 1.0000000 0.9020329 0.8324475 ## disp -0.8475514 0.9020329 1.0000000 0.7909486 ## hp -0.7761684 0.8324475 0.7909486 1.0000000

Indeed, there is not even the need to use ‘rep’ when we create SM1, as R will recycle the elements as needed.

Going from correlation to covariance can be done similarly:

R <- cor(matr) R / (1/SM1 * 1/SM2)

## mpg cyl disp hp ## mpg 36.324103 -9.172379 -633.0972 -320.7321 ## cyl -9.172379 3.189516 199.6603 101.9315 ## disp -633.097208 199.660282 15360.7998 6721.1587 ## hp -320.732056 101.931452 6721.1587 4700.8669

This is an easy task, but it got me stuck for a few minutes…

Lately, I finally discovered that there is (at least) one function in R taking care of the above task; it is the ‘cov2cor()’ function in the ‘nlme’ package.

library(nlme) cov2cor(V)

## mpg cyl disp hp ## mpg 1.0000000 -0.8521620 -0.8475514 -0.7761684 ## cyl -0.8521620 1.0000000 0.9020329 0.8324475 ## disp -0.8475514 0.9020329 1.0000000 0.7909486 ## hp -0.7761684 0.8324475 0.7909486 1.0000000

It is really easy to get drown in a glass of water!

**leave a comment**for the author, please follow the link and comment on their blog:

**R on The broken bridge between biologists and statisticians**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.