**Brian Callander**, and kindly contributed to R-bloggers)

# CIS Primer Question 1.5.2

Here are my solutions to question 1.5.2 of Causal Inference in Statistics: a Primer (CISP).

I’ll use different indexing to make the notation clearer. In particular, the indices will match the values of the conditioning variables.

## Part a

The full joint probability is

\[

\mathbb P(x, y, z)

=

\mathbb P (z) \cdot \mathbb P (x \mid z) \cdot \mathbb P (y \mid x, z)

\]

using the decomposition formula. Each factor is given by

\[

\begin{align}

\mathbb P (z)

&=

z r + (1 – z) (1 – r)

\\

\mathbb P (x \mid z)

&=

xq_z + (1 – x)(1 – q_z)

\\

\mathbb P (y \mid x, z)

&=

yp_{x, z} + (1 – y)(1 – p_{x, z})

\end{align}

\]

where each parameter is assumed to have support on \(\{0, 1\}\).

The marginal distributions are given by

\[

\begin{align}

\mathbb P(x, z)

&=

\mathbb P(x \mid z) \cdot \mathbb P (z)

\\

\mathbb P(y, z)

&=

\mathbb P(0, y, z) + \mathbb P(1, y, z)

\\

\mathbb P(x, y)

&=

\mathbb P(x, y, 0) + \mathbb P(x, y, 1)

\\

&=

yp_{x, 0} + (1 – y)(1 – p_{x, 0})

+

yp_{x, 1} + (1 – y)(1 – p_{x, 1})

\\

&=

y (p_{x, 0} + p_{x, 1}) + (1 – y)(2 – p_{x, 0} – p_{x, 1})

.

\end{align}

\]

Furthermore,

\[

\begin{align}

\mathbb P (x)

&=

\sum_z \mathbb P(x \mid z) \mathbb P (z)

\\

&=

\sum_z (xq_z + (1 – x)(1 – q_z))(zr + (1 – z)(1 – r))

\end{align}

\]

so that

\[

\begin{align}

\mathbb P(X = 0)

&=

(1 – q_0)(1 – r) + (1 – q_1)r

\\

\mathbb P(X = 1)

&=

q_0(1 – r) + q_1r

\end{align}

\]

## Part b

The increase in probability from taking the drug in each sub-population is:

- \(\mathbb P(y = 1 \mid x = 1, z = 0) – \mathbb P(y = 1 \mid x = 0, z = 0) = p_{1, 0} – p_{0, 0}\); and
- \(\mathbb P(y = 1 \mid x = 1, z = 1) – \mathbb P(y = 1 \mid x = 0, z = 1) = p_{1, 1} – p_{0, 1}\).

In the whole population, the increase is \(\mathbb P(Y = 1 \mid X = 1) – \mathbb P(Y = 1 \mid X = 0)\), calcualted via

\[

\begin{align}

&

\sum_{z = 0}^1

\mathbb P(Y = 1, Z = z \mid X = 1) – \mathbb P(Y = 1, Z = z \mid X = 0)

\\

&=

\sum_{z = 0}^1

\frac{\mathbb P(X = 1, Y = 1, Z = z)}{\mathbb P(X = 1)} – \frac{\mathbb P(X = 0, Y = 1, Z = z)}{\mathbb P(X = 0)}

\\

&=

\frac{(1 – r)q_0p_{1, 0} + rq_1p_{1, 1}}{q_0(1 – r) + q_1r}

–

\frac{(1 – r)(1 – q_0)p_{0, 0} + r(1 – q_1)p_{0, 1}}{(1 – q_0)(1 – r) + (1 – q_1)r}

\end{align}

\]

## Part c

There’s no need to be smart about this. Let’s just simulate lots of values and find some combination with a Simpson’s reversal. We’ll generate a dataset with a positive probability difference in each sub-population, then filter out anything that also has a non-negative population difference.

```
set.seed(8168)
N <- 10000
part_c <- tibble(
id = 1:N %>% as.integer(),
r = rbeta(N, 2, 2), # P(Z = 1)
q0 = rbeta(N, 2, 2), # P(X = 1 | Z = 0)
q1 = rbeta(N, 2, 2), # P(X = 1 | Z = 1)
p00 = rbeta(N, 2, 2), # P(Y = 1 | X = 0, Z = 0)
p10 = rbeta(N, 2, 2) * (p00 - 1) + 1, # P(Y = 1 | X = 1, Z = 0)
p01 = rbeta(N, 2, 2), # P(Y = 1 | X = 0, Z = 1)
p11 = rbeta(N, 2, 2) * (p01 - 1) + 1, # P(Y = 1 | X = 1, Z = 1)
diff_pop = (p10 * q0 * (1 - r) + p11 * q1 * r) / (q0 * (1 - r) + q1 * r) - (p00 * (1 - q0) * (1 - r) + p01 * (1 - q1) * r) / ((1 - q0) * (1 - r) + (1 - q1) * r),
diff_z0 = p10 - p00,
diff_z1 = p11 - p01
)
```

As a check, there should be no rows with a non-positive difference.

```
check <- part_c %>%
filter(diff_z0 <= 0 | diff_z1 <= 0) %>%
nrow()
# throw error if there are rows
stopifnot(check == 0)
check
```

`[1] 0`

Now we simply throw away any rows with a non-negative population difference. Here is one combination of parameters exhibiting Simpson’s reversal.

```
simpsons_reversal <- part_c %>%
filter(diff_pop < -0.05) %>%
head(1) %>%
gather(term, value)
```

term | value |
---|---|

id | 109.0000000 |

r | 0.2837123 |

q0 | 0.0664811 |

q1 | 0.8468126 |

p00 | 0.8441892 |

p10 | 0.8827558 |

p01 | 0.5273831 |

p11 | 0.5816885 |

diff_pop | -0.1933634 |

diff_z0 | 0.0385666 |

diff_z1 | 0.0543054 |

As a final check, let’s generate a dataset for this set of parameters.

```
df <- simpsons_reversal %>%
spread(term, value) %>%
crossing(unit = 1:N) %>%
mutate(
z = rbinom(N, 1, r),
x = rbinom(N, 1, if_else(z == 0, q0, q1)),
p_y_given_x_z = case_when(
x == 0 & z == 0 ~ p00,
x == 0 & z == 1 ~ p01,
x == 1 & z == 0 ~ p10,
x == 1 & z == 1 ~ p11
),
y = rbinom(N, 1, p_y_given_x_z)
) %>%
select(unit, x, y, z)
```

The empirical joint probability distribution is as follows.

```
p_x_y_z <- df %>%
group_by(x, y, z) %>%
count() %>%
ungroup() %>%
mutate(p = n / sum(n))
```

x | y | z | n | p |
---|---|---|---|---|

0 | 0 | 0 | 1068 | 0.1068 |

0 | 0 | 1 | 197 | 0.0197 |

0 | 1 | 0 | 5609 | 0.5609 |

0 | 1 | 1 | 224 | 0.0224 |

1 | 0 | 0 | 52 | 0.0052 |

1 | 0 | 1 | 1016 | 0.1016 |

1 | 1 | 0 | 400 | 0.0400 |

1 | 1 | 1 | 1434 | 0.1434 |

The population-level probability difference is given by:

```
diff_pop <- p_x_y_z %>%
group_by(x) %>%
summarise(p = sum(n * y) / sum(n)) %>%
spread(x, p) %>%
mutate(diff = `1` - `0`)
```

0 | 1 | diff |
---|---|---|

0.8217808 | 0.6319779 | -0.1898028 |

which is close to the theoretical value.

Similarly, the sub-population differences are

```
diff_z <- p_x_y_z %>%
group_by(x, z) %>%
summarise(p = sum(n * y) / sum(n)) %>%
spread(x, p) %>%
mutate(diff = `1` - `0`)
```

z | 0 | 1 | diff |
---|---|---|---|

0 | 0.8400479 | 0.8849558 | 0.0449078 |

1 | 0.5320665 | 0.5853061 | 0.0532396 |

which are also close to the theoretical values we calculated. More importantly, they have a different sign to the population difference, confiming that we have case of Simpson’s reversal.

**leave a comment**for the author, please follow the link and comment on their blog:

**Brian Callander**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...