CIS Primer Question 3.4.1

[This article was first published on Brian Callander, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

CIS Primer Question 3.4.1

Here are my solutions to question 3.4.1 of Causal Inference in Statistics: a Primer (CISP). \(\DeclareMathOperator{\do}{do}\)

If we can only measure one additional variable to estimate the causal effect of \(X\) on \(Y\) in figure 3.8, then we should measure \(W\). From question 3.3.1 we see that no single variable satisfies the backdoor criteria. Moreover, visual inspection of the graph verifies that \(W\) satisfies the frontdoor criteria:

  1. it intercepts all (the only) directed paths from \(X\) to \(Y\);
  2. there is no unblocked path from \(X\) to \(W\); and
  3. all backdoor paths from \(W\) to \(Y\) are blocked by \(X\).

To illustrate this, lets simulate the causal effect in 3 separate ways:

  1. by intervention,
  2. via the backdoor, and
  3. via the frontdoor.

Here are the data. Note that we have created functions for \(W\) and \(Y\) for use later.

N <- 100000

W <- function(x) {
  N <- length(x)
  rbinom(N, 1, inv_logit(-x))
}

Y <- function(d, w, z) {
  N <- length(d)
  rbinom(N, 1, inv_logit(-d - w + 3*z))
}

df <- tibble(id = 1:N) %>% 
  mutate(
    b = rnorm(N, 0, 1),
    a = b + rnorm(N, 0, 0.1),
    c = rnorm(N, 0, 1),
    d = rbinom(N, 1, inv_logit(-1 + c)),
    z = rbinom(N, 1, inv_logit(-2 + 2*b + c)),
    x = rbinom(N, 1, inv_logit(a + z)),
    w = W(x),
    y = Y(d, w, z)
  )
Simulated data for figure 3.8
id b a c d z x w y
1 0.3641297 0.3917626 1.0369530 1 1 0 0 1
2 0.0287563 0.0397299 0.5736271 0 0 1 1 0
3 -0.7727052 -0.5993870 -0.5179657 0 1 0 1 1
4 0.4107888 0.5737898 1.2586840 0 1 1 0 1
5 2.3512417 2.1631719 0.6746523 1 1 1 0 1

Intervention

In order to simulate an intervention, we assign values to \(X\) randomly, then assign new values for all its descendents. After intervention, the causal effect of \(X\) on \(Y\) is simply \(\mathbb P(Y \mid X)\).

intervention <- df %>% 
  # intervene on x
  mutate(
    x = rbinom(n(), 1, 0.5),
    w = W(x),
    y = Y(d, w, z)
  ) %>% 
  # model P(y | do(x))
  glm(
    formula = y ~ x, 
    family = binomial(), 
    data = .
  ) %>% 
  # predict
  augment(
    newdata = tibble(x = 0:1), 
    type.predict = 'response'
  ) 
P(Y | do(X))
x .fitted .se.fit
0 0.4637566 0.0022239
1 0.5072714 0.0022422

We can compare this causal effect to the simple statistical effect to see the difference.

noncausal <- df %>% 
  # model P(y | x)
  glm(
    formula = y ~ x, 
    family = binomial(), 
    data = .
  ) %>% 
  # predict
  augment(
    newdata = tibble(x = 0:1), 
    type.predict = 'response'
  ) 
P(Y | X) ≠ P(Y | do(X))
x .fitted .se.fit
0 0.3797282 0.0022595
1 0.5759927 0.0021293

Backdoor

Since \(\{X, Z\}\) satisfies the backdoor criteria, we can use it to apply the backdoor adjustment. First we’ll need \(\mathbb P(D, Z)\).

# P(d, z)
p_d_z <- df %>% 
  group_by(d, z) %>% 
  count() %>% 
  ungroup() %>%  
  mutate(p_d_z = n / sum(n)) 

Now we model \(\mathbb P(Y \mid X, D, Z)\), multiply it by \(\mathbb P(D, Z)\), then take the sum for each value of \(X\).

backdoor <- formula(y ~ 1 + x + z + d) %>% 
  # model P(y | x, d, z)
  glm(
    family = binomial(),
    data = df
  ) %>%  
  # predict
  augment(
    type.predict = 'response',
    newdata = 
      crossing(
        d = c(0, 1),
        x = c(0, 1),
        z = c(0, 1)
      )
  ) %>% 
  # get P(d, z)
  mutate(p_y_given_d_x_z = .fitted) %>% 
  inner_join(p_d_z, by = c('d', 'z')) %>% 
  # backdoor adjustment over d, z
  group_by(x) %>% 
  summarise(p_y_given_do_x = sum(p_y_given_d_x_z * p_d_z))
Backdoor estimates for P(Y | do(X))
x p_y_given_do_x
0 0.4681530
1 0.5033398

Note that the backdoor adjusted estimates are similar to the estimates from intervention.

Frontdoor

To apply the frontdoor adjustment with \(W\), we’ll need \(\mathbb P(W \mid X)\), \(\mathbb P(X^\prime)\), and \(\mathbb P(Y \mid X, W)\).

p_w_given_x <- df %>% 
  group_by(x, w) %>% 
  count() %>% 
  group_by(x) %>% 
  mutate(p_w_given_x = n / sum(n)) %>% 
  ungroup()

p_xprime <- df %>% 
  group_by(xprime = x) %>% 
  count() %>% 
  ungroup() %>% 
  mutate(p_xprime = n / sum(n))

p_y_given_xprime_w <- formula(y ~ 1 + x + w) %>% 
  glm(
    family = binomial(),
    data = df
  ) %>% 
  augment(
    newdata = crossing(x = 0:1, w = 0:1),
    type.predict = 'response'
  ) %>% 
  transmute(
    xprime = x,
    w,
    p_y_given_xprime_w = .fitted
  )

Now we apply the frontdoor adjustment:

\[ \mathbb P (Y \mid \do(X)) = \sum_{x^\prime, w} \mathbb P(x^\prime) \cdot \mathbb P(w \mid x) \cdot \mathbb P (y \mid x^\prime, w) . \]

frontdoor <- p_w_given_x %>% 
  inner_join(p_y_given_xprime_w, by = 'w') %>% 
  inner_join(p_xprime, by = 'xprime') %>% 
  group_by(x) %>%
  summarise(sum(p_w_given_x * p_y_given_xprime_w * p_xprime))
Frontdoor estimates of P(Y | do(X))
x sum(p_w_given_x * p_y_given_xprime_w * p_xprime)
0 0.4623710
1 0.5041105

Our frontdoor estimates of \(\mathbb P(Y \mid \do(X))\) are very similar to the intervention and backdoor estimates.

To leave a comment for the author, please follow the link and comment on their blog: Brian Callander.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)