# Correlation with constraints on pairs

March 31, 2014
By

(This article was first published on Freakonometrics » R-english, and kindly contributed to R-bloggers)

An interesting question was posted on http://math.stackexchange.com/726205/…: if one knows the covariances $\text{cov}(X,Y)$ and $\text{cov}(X,Z)$, is it possible to infer $\text{cov}(Y,Z)$? I asked myself a question close to this one a few weeks ago (that I might also relate to a question I asked a long time ago, about possible correlations between three exchange rates, on financial markets). More precisely, if one knows the correlations $\text{corr}(X,Y)$ and $\text{corr}(X,Z)$, is it possible to say something about $\text{corr}(Y,Z)$?

I could not find much details (but maybe I did not look enough in the existing literature). My strategy was to consider the correlation matrix, and to use the fact that a correlation matrix is symmetric, positive semidefinite matrix (also called Gramian matrix, which is a matrix with no negative eigenvalues). given the two correlations, we should consider the function of the third correlation, which indicates whether the smallest eigenvalue is non-negative, or not. Then, I look at the range of the third correlation, to get the minimum and the maximum possible value (I guess we can prove that possible values belongs to some interval). The code to get that is simply

```corrminmax=function(r1,r2){
h=function(r3){
R=matrix(c(1,r1,r2,r1,1,r3,r2,r3,1),3,3)
return(min(eigen(R)\$values)>0)}
vc=seq(-1,+1,length=1e4+1)
vr=Vectorize(h)(vc)
indx=which(vr==TRUE)
return(vc[range(indx)])
}```

Using this code, it is possible to look at the smallest correlation for the third pair, as well as the maximum correlation,

```x1=seq(-1,1,by=.1)
x2=seq(-1,1,by=.1)
W=M=matrix(NA,length(x1),length(x2))
for(i in 1:length(x1)){
for(j in 1:length(x2)){
C=corrminmax(x1[i],x2[j])
W[i,j]=C[1]
M[i,j]=C[2]
}}```

If we plot those matrices, we get

```par(mfrow=c(1,2))
persp(x1,x2,W,zlim=c(-1,1),col="green",
persp(x1,x2,M,zlim=c(-1,1),col="green",

and if we plot the difference, to get the range of the interval we clearly see that the largest range is obtained when the two correlations are null (in that case, any correlation is valid)

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...