# Correlation with constraints on pairs

March 31, 2014
By

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

An interesting question was posted on http://math.stackexchange.com/726205/…: if one knows the covariances $\text{cov}(X,Y)$ and $\text{cov}(X,Z)$, is it possible to infer $\text{cov}(Y,Z)$? I asked myself a question close to this one a few weeks ago (that I might also relate to a question I asked a long time ago, about possible correlations between three exchange rates, on financial markets). More precisely, if one knows the correlations $\text{corr}(X,Y)$ and $\text{corr}(X,Z)$, is it possible to say something about $\text{corr}(Y,Z)$?

I could not find much details (but maybe I did not look enough in the existing literature). My strategy was to consider the correlation matrix, and to use the fact that a correlation matrix is symmetric, positive semidefinite matrix (also called Gramian matrix, which is a matrix with no negative eigenvalues). given the two correlations, we should consider the function of the third correlation, which indicates whether the smallest eigenvalue is non-negative, or not. Then, I look at the range of the third correlation, to get the minimum and the maximum possible value (I guess we can prove that possible values belongs to some interval). The code to get that is simply

```corrminmax=function(r1,r2){
h=function(r3){
R=matrix(c(1,r1,r2,r1,1,r3,r2,r3,1),3,3)
return(min(eigen(R)\$values)>0)}
vc=seq(-1,+1,length=1e4+1)
vr=Vectorize(h)(vc)
indx=which(vr==TRUE)
return(vc[range(indx)])
}```

Using this code, it is possible to look at the smallest correlation for the third pair, as well as the maximum correlation,

```x1=seq(-1,1,by=.1)
x2=seq(-1,1,by=.1)
W=M=matrix(NA,length(x1),length(x2))
for(i in 1:length(x1)){
for(j in 1:length(x2)){
C=corrminmax(x1[i],x2[j])
W[i,j]=C[1]
M[i,j]=C[2]
}}```

If we plot those matrices, we get

```par(mfrow=c(1,2))
persp(x1,x2,W,zlim=c(-1,1),col="green",
persp(x1,x2,M,zlim=c(-1,1),col="green",

and if we plot the difference, to get the range of the interval we clearly see that the largest range is obtained when the two correlations are null (in that case, any correlation is valid)

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.