Fixing non positive definite correlation matrices using R

(This article was first published on a modeler's tribulations, gopi goteti's web log, and kindly contributed to R-bloggers)

Problem

When a correlation or covariance matrix is not positive definite (i.e., in instances when some or all eigenvalues are negative), a cholesky decomposition cannot be performed. Sometimes, these eigenvalues are very small negative numbers and occur due to rounding or due to noise in the data. In simulation studies a known/given correlation has to be imposed on an input dataset. In such cases one has to deal with the issue of making a correlation matrix positive definite. Following are papers in the field of stochastic precipitation where such matrices are used.

  • FP Brissette, M Khalili, R Leconte, Journal of Hydrology, 2007, “Efficient stochastic generation of multi-site synthetic precipitation data”
  • GA Baigorria, JW Jones, Journal of Climate, 2010, “GiST: A stochastic model for generating spatially and temporally correlated daily rainfall data”
  • M Mhanna and W Bauwens, International Journal of Climatology, 2012, “A stochastic space-time model for the generation of daily rainfall in the Gaza Strip”

A Solution

The paper by Rebonato and Jackel, “The most general methodology for creating a valid correlation matrix for risk management and option pricing purposes”, Journal of Risk, Vol 2, No 2, 2000, presents a methodology to create a positive definite matrix out of a non-positive definite matrix.

Below is my attempt to reproduce the example from Rebonato and Jackel (2000). The correlation matrix below is from the example.


origMat <- array(c(1, 0.9, 0.7, 0.9, 1, 0.3, 0.7, 0.3, 1), dim = c(3, 
    3))

origEig <- eigen(origMat)
origEig$values
## [1]  2.296728  0.710625 -0.007352

As you can see, the third eigenvalue is negative. Trying a cholesky decomposition on this matrix fails, as expected.

cholStatus <- try(u <- chol(origMat), silent = FALSE)
cholError <- ifelse(class(cholStatus) == "try-error", TRUE, FALSE)

Here, I use the method of Rebonato and Jackel (2000), as elaborated by Brissette et al. (2007), to fix the correlation matrix. As per the method, replace the negative eigenvalues with 0 (or a small positive number as Brissette et al. 2007 suggest), then normalize the new vector.

# fix the correl matrix
newMat <- origMat

iter <- 0
while (cholError) {

    iter <- iter + 1
    cat("iteration ", iter, "\n")

    # replace -ve eigen values with small +ve number
    newEig <- eigen(newMat)
    newEig2 <- ifelse(newEig$values < 0, 0, newEig$values)

    # create modified matrix eqn 5 from Brissette et al 2007, inv = transp for
    # eig vectors
    newMat <- newEig$vectors %*% diag(newEig2) %*% t(newEig$vectors)

    # normalize modified matrix eqn 6 from Brissette et al 2007
    newMat <- newMat/sqrt(diag(newMat) %*% t(diag(newMat)))

    # try chol again
    cholStatus <- try(u <- chol(newMat), silent = TRUE)
    cholError <- ifelse(class(cholStatus) == "try-error", TRUE, FALSE)
}
## iteration  1 
## iteration  2 

# final check
eigen(newMat)$values
## [1]  2.290e+00  7.096e-01 -1.332e-15

Unresolved Issue(?)

However, as you can see, the third eigenvalue is still negative (but very close to zero). The “chol” function in R is not giving an error probably because this negative eigenvalue is within the “tolerance limits”. I would like to know what these “tolerance limits” are.

To leave a comment for the author, please follow the link and comment on his blog: a modeler's tribulations, gopi goteti's web log.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.