CIS Primer Question 1.5.2

February 9, 2019
By

(This article was first published on Brian Callander, and kindly contributed to R-bloggers)

CIS Primer Question 1.5.2

Here are my solutions to question 1.5.2 of Causal Inference in Statistics: a Primer (CISP).

I’ll use different indexing to make the notation clearer. In particular, the indices will match the values of the conditioning variables.

Part a

The full joint probability is

\[
\mathbb P(x, y, z)
=
\mathbb P (z) \cdot \mathbb P (x \mid z) \cdot \mathbb P (y \mid x, z)
\]

using the decomposition formula. Each factor is given by

\[
\begin{align}
\mathbb P (z)
&=
z r + (1 – z) (1 – r)
\\
\mathbb P (x \mid z)
&=
xq_z + (1 – x)(1 – q_z)
\\
\mathbb P (y \mid x, z)
&=
yp_{x, z} + (1 – y)(1 – p_{x, z})
\end{align}
\]

where each parameter is assumed to have support on \(\{0, 1\}\).

The marginal distributions are given by

\[
\begin{align}
\mathbb P(x, z)
&=
\mathbb P(x \mid z) \cdot \mathbb P (z)
\\
\mathbb P(y, z)
&=
\mathbb P(0, y, z) + \mathbb P(1, y, z)
\\
\mathbb P(x, y)
&=
\mathbb P(x, y, 0) + \mathbb P(x, y, 1)
\\
&=
yp_{x, 0} + (1 – y)(1 – p_{x, 0})
+
yp_{x, 1} + (1 – y)(1 – p_{x, 1})
\\
&=
y (p_{x, 0} + p_{x, 1}) + (1 – y)(2 – p_{x, 0} – p_{x, 1})
.
\end{align}
\]

Furthermore,

\[
\begin{align}
\mathbb P (x)
&=
\sum_z \mathbb P(x \mid z) \mathbb P (z)
\\
&=
\sum_z (xq_z + (1 – x)(1 – q_z))(zr + (1 – z)(1 – r))
\end{align}
\]

so that

\[
\begin{align}
\mathbb P(X = 0)
&=
(1 – q_0)(1 – r) + (1 – q_1)r
\\
\mathbb P(X = 1)
&=
q_0(1 – r) + q_1r
\end{align}
\]

Part b

The increase in probability from taking the drug in each sub-population is:

  • \(\mathbb P(y = 1 \mid x = 1, z = 0) – \mathbb P(y = 1 \mid x = 0, z = 0) = p_{1, 0} – p_{0, 0}\); and
  • \(\mathbb P(y = 1 \mid x = 1, z = 1) – \mathbb P(y = 1 \mid x = 0, z = 1) = p_{1, 1} – p_{0, 1}\).

In the whole population, the increase is \(\mathbb P(Y = 1 \mid X = 1) – \mathbb P(Y = 1 \mid X = 0)\), calcualted via

\[
\begin{align}
&
\sum_{z = 0}^1
\mathbb P(Y = 1, Z = z \mid X = 1) – \mathbb P(Y = 1, Z = z \mid X = 0)
\\
&=
\sum_{z = 0}^1
\frac{\mathbb P(X = 1, Y = 1, Z = z)}{\mathbb P(X = 1)} – \frac{\mathbb P(X = 0, Y = 1, Z = z)}{\mathbb P(X = 0)}
\\
&=
\frac{(1 – r)q_0p_{1, 0} + rq_1p_{1, 1}}{q_0(1 – r) + q_1r}

\frac{(1 – r)(1 – q_0)p_{0, 0} + r(1 – q_1)p_{0, 1}}{(1 – q_0)(1 – r) + (1 – q_1)r}
\end{align}
\]

Part c

There’s no need to be smart about this. Let’s just simulate lots of values and find some combination with a Simpson’s reversal. We’ll generate a dataset with a positive probability difference in each sub-population, then filter out anything that also has a non-negative population difference.

set.seed(8168)

N <- 10000

part_c <- tibble(
  id = 1:N %>% as.integer(),
  
  r = rbeta(N, 2, 2),   # P(Z = 1)
  
  q0 = rbeta(N, 2, 2),  # P(X = 1 | Z = 0)
  q1 = rbeta(N, 2, 2),  # P(X = 1 | Z = 1)
  
  p00 = rbeta(N, 2, 2), # P(Y = 1 | X = 0, Z = 0)
  p10 = rbeta(N, 2, 2) * (p00 - 1) + 1, # P(Y = 1 | X = 1, Z = 0)
  p01 = rbeta(N, 2, 2), # P(Y = 1 | X = 0, Z = 1)
  p11 = rbeta(N, 2, 2) * (p01 - 1) + 1, # P(Y = 1 | X = 1, Z = 1)
  
  diff_pop = (p10 * q0 * (1 - r) + p11 * q1 * r) / (q0 * (1 - r) + q1 * r) - (p00 * (1 - q0) * (1 - r) + p01 * (1 - q1) * r) / ((1 - q0) * (1 - r) + (1 - q1) * r),
  diff_z0 = p10 - p00,
  diff_z1 = p11 - p01
) 

As a check, there should be no rows with a non-positive difference.

check <- part_c %>% 
  filter(diff_z0 <= 0 | diff_z1 <= 0) %>% 
  nrow()

# throw error if there are rows
stopifnot(check == 0)

check
[1] 0

Now we simply throw away any rows with a non-negative population difference. Here is one combination of parameters exhibiting Simpson’s reversal.

simpsons_reversal <- part_c %>% 
  filter(diff_pop < -0.05) %>% 
  head(1) %>% 
  gather(term, value)
Parameters leading to Simpson’s reversal
term value
id 109.0000000
r 0.2837123
q0 0.0664811
q1 0.8468126
p00 0.8441892
p10 0.8827558
p01 0.5273831
p11 0.5816885
diff_pop -0.1933634
diff_z0 0.0385666
diff_z1 0.0543054

As a final check, let’s generate a dataset for this set of parameters.

df <- simpsons_reversal %>% 
  spread(term, value) %>% 
  crossing(unit = 1:N) %>% 
  mutate(
    z = rbinom(N, 1, r),
    x = rbinom(N, 1, if_else(z == 0, q0, q1)),
    p_y_given_x_z = case_when(
      x == 0 & z == 0 ~ p00,
      x == 0 & z == 1 ~ p01,
      x == 1 & z == 0 ~ p10,
      x == 1 & z == 1 ~ p11
    ),
    y = rbinom(N, 1, p_y_given_x_z)
  ) %>% 
  select(unit, x, y, z)

The empirical joint probability distribution is as follows.

p_x_y_z <- df %>% 
  group_by(x, y, z) %>% 
  count() %>% 
  ungroup() %>% 
  mutate(p = n / sum(n))
x y z n p
0 0 0 1068 0.1068
0 0 1 197 0.0197
0 1 0 5609 0.5609
0 1 1 224 0.0224
1 0 0 52 0.0052
1 0 1 1016 0.1016
1 1 0 400 0.0400
1 1 1 1434 0.1434

The population-level probability difference is given by:

diff_pop <- p_x_y_z %>% 
  group_by(x) %>% 
  summarise(p = sum(n * y) / sum(n)) %>% 
  spread(x, p) %>%
  mutate(diff = `1` - `0`)
0 1 diff
0.8217808 0.6319779 -0.1898028

which is close to the theoretical value.

Similarly, the sub-population differences are

diff_z <- p_x_y_z %>% 
  group_by(x, z) %>% 
  summarise(p = sum(n * y) / sum(n)) %>% 
  spread(x, p) %>% 
  mutate(diff = `1` - `0`)
z 0 1 diff
0 0.8400479 0.8849558 0.0449078
1 0.5320665 0.5853061 0.0532396

which are also close to the theoretical values we calculated. More importantly, they have a different sign to the population difference, confiming that we have case of Simpson’s reversal.

To leave a comment for the author, please follow the link and comment on their blog: Brian Callander.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)