**R-bloggers on Mikkel Meyer Andersen**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

**Update Aug 10, 2019: I wrote a new blog post about the same as below but using a simulation approach.**

**Update Aug 27, 2019: Minor change in how equations are solved (from version 0.9.0.9122).**

Let \(\rho_{XY}\) be the correlation between the stochastic variables \(X\) and \(Y\) and similarly for \(\rho_{XZ}\) and \(\rho_{YZ}\).

If we know two of these, can we say anything about the third?

Yes, sometimes, but not always.

Say we have \(\rho_{XZ}\) and \(\rho_{YZ}\) and they are both positive.

Intuition would then make us believe that \(\rho_{XY}\) is probably also positive then.

But that is not always the case.

To see why, we need the partial correlation of \(X\) and \(Y\) given \(Z\):

\[

\rho_{XY \mid Z}={\frac {\rho_{XY}-\rho _{XZ}\rho_{YZ}}{{\sqrt {1-\rho_{XZ}^{2}}}{\sqrt {1-\rho_{YZ}^{2}}}}}

\]

The partial correlation between \(X\) and \(Y\) given \(Z\) means the correlation between \(X\) and \(Y\) with the effect of \(Z\) removed. Let us make a small example using the `mtcars`

data:

```
x <- mtcars$mpg
y <- mtcars$drat
z <- mtcars$wt
cor(x, y)
## [1] 0.6811719
x_without_z <- residuals(lm(x ~ z))
y_without_z <- residuals(lm(y ~ z))
cor(x_without_z, y_without_z)
## [1] 0.1806278
(cor(x, y) - cor(x, z)*cor(y, z))/(sqrt(1 - cor(x, z)^2)*sqrt(1 - cor(y, z)^2))
## [1] 0.1806278
```

So the correlation between `mpg`

(miles per gallon) and `drat`

(rear axle ratio) is 0.6811719.

And the partial correlation between `mpg`

and `draw`

given `wt`

(weight in 1000 lbs) is 0.1806278.

We can rearrange the formula using `Ryacas`

(see e.g. this blog post, note that it I’m using the development version that is currently only available via GitHub):

```
library(Ryacas)
pcor <- "rXYZ == (rXY - rXZ*rYZ)/(Sqrt(1 - rXZ^2)*Sqrt(1 - rYZ^2))"
rXY <- pcor %>% y_fn("Solve", "rXY") %>% yac_str()
rXY
## [1] "{rXY==(rXYZ-(-rXZ*rYZ)/(Sqrt(1-rXZ^2)*Sqrt(1-rYZ^2)))*Sqrt(1-rXZ^2)*Sqrt(1-rYZ^2)}"
rXY %>% y_rmvars() %>% yac_str()
## [1] "{(rXYZ-(-rXZ*rYZ)/(Sqrt(1-rXZ^2)*Sqrt(1-rYZ^2)))*Sqrt(1-rXZ^2)*Sqrt(1-rYZ^2)}"
rXY %>% y_fn("TeXForm") %>% yac_str()
## [1] "\\left( \\mathrm{ rXY } = \\left( \\mathrm{ rXYZ } - \\frac{ - \\mathrm{ rXZ } \\mathrm{ rYZ }}{\\sqrt{1 - \\mathrm{ rXZ } ^{2}} \\sqrt{1 - \\mathrm{ rYZ } ^{2}}} \\right) \\sqrt{1 - \\mathrm{ rXZ } ^{2}} \\sqrt{1 - \\mathrm{ rYZ } ^{2}}\\right) "
```

\[\begin{align}

\rho_{XY}

&= \left( \rho_{XY \mid Z} – \frac{ – \rho_{XZ} \rho_{YZ}}{\sqrt{1 – \rho_{XZ}^{2}} \sqrt{1 – \rho_{YZ}^{2}}} \right) \sqrt{1 – \rho_{XZ}^{2}} \sqrt{1 – \rho_{YZ}^{2}} \\

&= \rho_{XY \mid Z} \sqrt{1 – \rho_{XZ}^{2}} \sqrt{1 – \rho_{YZ}^{2}} + \rho_{XZ} \rho_{YZ}

\end{align}\]

The partial correlation takes values between \(-1\) and \(1\) so \(\rho_{XY}\) must lie in the interval

\[

\rho_{XZ} \rho_{YZ} \pm \sqrt{1 – \rho_{XZ}^{2}} \sqrt{1 – \rho_{YZ}^{2}} .

\]

Going back to the `mtcars`

example:

```
rXZ <- cor(x, z)
rYZ <- cor(y, z)
rXZ*rYZ + c(-1, 1)*sqrt(1 - rXZ^2)*sqrt(1 - rYZ^2)
## [1] 0.2692831 0.9670285
cor(x, y)
## [1] 0.6811719
```

We can illustrate this

```
library(tidyverse)
grid <- expand.grid(
rXZ = seq(-1, 1, len = 100),
rYZ = seq(-1, 1, len = 100))
d <- grid %>%
as_tibble() %>%
mutate(
rXY_l = rXZ*rYZ - sqrt(1 - rXZ^2)*sqrt(1 - rYZ^2),
rXY_u = rXZ*rYZ + sqrt(1 - rXZ^2)*sqrt(1 - rYZ^2),
) %>%
mutate(rXY_sign = case_when(
rXY_l < 0 & rXY_u < 0 ~ "-1",
rXY_l > 0 & rXY_u > 0 ~ "1",
TRUE ~ "?"
))
p <- ggplot(d, aes(rXZ, rYZ, fill = rXY_sign)) +
geom_tile() +
scale_fill_manual(expression(paste("Sign of ", rho[XY])),
values = c("-1" = "red", "1" = "blue", "?" = "grey")) +
labs(x = expression(rho[XZ]),
y = expression(rho[YZ])) +
theme_bw() +
coord_equal()
p
```

Interesting! We actually just found the circle with radius 1!

```
p + annotate("path",
x = cos(seq(0, 2*pi, length.out = 100)),
y = sin(seq(0, 2*pi, length.out = 100)),
color = "green", size = 2)
```

We can also argue mathematically: Before we found that \(\rho_{XY}\) must lie in the interval

\[

\rho_{XZ} \rho_{YZ} \pm \sqrt{1 – \rho_{XZ}^{2}} \sqrt{1 – \rho_{YZ}^{2}} .

\]

If \(\rho_{XY}\) must be positive, then we must have that both limits are greater than zero, namely that

\[\begin{align}

\rho_{XZ} \rho_{YZ} – \sqrt{1 – \rho_{XZ}^{2}} \sqrt{1 – \rho_{YZ}^{2}} &> 0 \\

\rho_{XZ} \rho_{YZ} + \sqrt{1 – \rho_{XZ}^{2}} \sqrt{1 – \rho_{YZ}^{2}} &> 0 .

\end{align}\]

We move the \(\sqrt{1 – \rho_{XZ}^{2}} \sqrt{1 – \rho_{YZ}^{2}}\) part on the other side of the equality sign and square both sides to obtain the same two equations, namely:

\[\begin{align}

\rho_{XZ}^2 \rho_{YZ}^2

&> \left ( 1 – \rho_{XZ}^{2}\right ) \left (1 – \rho_{YZ}^{2}\right ) \\

&= 1 – \rho_{XZ}^{2} – \rho_{YZ}^{2} + \rho_{XZ}^2 \rho_{YZ}^2 .

\end{align}\]

To summarise: If \(\rho_{XY}\) must be positive, then we must have that

\[

\rho_{XZ}^{2} + \rho_{YZ}^{2} > 1 .

\]

Similar arguments can be used to show the same if \(\rho_{XY}\) must be negative.

In other words, if we have two known correlations, \(\rho_{XZ}\) and \(\rho_{YZ}\), then

\[

\text{sign}(\rho_{XY})

=

\begin{cases}

\text{sign}(\rho_{XZ}) \text{sign}(\rho_{YZ}) & \rho_{XZ}^2 + \rho_{YZ}^2 > 1 \\

\text{not known} & \rho_{XZ}^2 + \rho_{YZ}^2 \leq 1.

\end{cases}

\]

In other words, we only know the sign of the correlation when the two known correlations are sufficiently large (either positive or negative).

Again, going back to the `mtcars`

example:

```
rXZ <- cor(x, z)
rYZ <- cor(y, z)
rXZ^2 + rYZ^2
## [1] 1.260404
```

So the sign of \(\rho_{XY}\) can be known and is positive because

```
sign(rXZ) * sign(rYZ)
## [1] 1
```

This was also known from calculating the limits earlier.

Terence Tao has a mathematical treatment of the topic in his blog post “When is correlation transitive?”.

**leave a comment**for the author, please follow the link and comment on their blog:

**R-bloggers on Mikkel Meyer Andersen**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.