SR2 Chapter 3 Medium

[This article was first published on Brian Callander, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

SR2 Chapter 3 Medium

Here’s my solution to the medium exercises in chapter 3 of McElreath’s Statistical Rethinking, 2nd edition.

\(\DeclareMathOperator{\dbinomial}{Binomial} \DeclareMathOperator{\dbernoulli}{Bernoulli} \DeclareMathOperator{\dpoisson}{Poisson} \DeclareMathOperator{\dnormal}{Normal} \DeclareMathOperator{\dt}{t} \DeclareMathOperator{\dcauchy}{Cauchy} \DeclareMathOperator{\dexponential}{Exp} \DeclareMathOperator{\duniform}{Uniform} \DeclareMathOperator{\dgamma}{Gamma} \DeclareMathOperator{\dinvpamma}{Invpamma} \DeclareMathOperator{\invlogit}{InvLogit} \DeclareMathOperator{\logit}{Logit} \DeclareMathOperator{\ddirichlet}{Dirichlet} \DeclareMathOperator{\dbeta}{Beta}\)

Assuming Earth has 70% water cover, and we observe water 8 times out of 15 globe tosses, let’s calculate some posterior quantities with two choices of prior: uniform and step.

p_true <- 0.7

W <- 8
N <- 15

granularity <- 1000 # points on the grid

Uniform Prior

We calculate the grid approximation of the posterior as shown in the book.

m1_grid <- tibble(p = seq(0, 1, length.out = granularity)) %>% 
  mutate(prior = 1)

m1_posterior <- m1_grid %>% 
  mutate(
    likelihood = dbinom(W, N, p),
    posterior = prior * likelihood
  )
Solution to exercise 3M1
Solution to exercise 3M1

We can get draws from our posterior by sampling the water cover values many times with replacement, each value being drawn in proportion to the posterior probability. We can then just summarise these draws to get the desired interval.

m2_samples <- m1_posterior %>% 
  sample_n(10000, replace = T, weight = posterior)

m2_hpdi <- HPDI(m2_samples$p, prob = 0.9)
m2_hpdi
     |0.9      0.9| 
0.3223223 0.7097097 

The histogram looks as follows. This is much the same as the previous graph, but calculated from the samples.

Solution to exercise 3M2
Solution to exercise 3M2

To get the posterior predictive sample, we take our posterior draws of \(p\), then use them to draw a random number of observed water tosses out of 15. The fraction of posterior predictive samples with a given value is then the posterior predictive probability of that value.

m3_prob <- m2_samples %>% 
  mutate(W = rbinom(n(), 15, p)) %>% 
  group_by(W) %>% 
  tally() %>% 
  mutate(probability = n / sum(n))
Solution to exercise 3M3
Solution to exercise 3M3

We can also calculate the posterior predictive probabilities with a different number of tosses. Here with 9 tosses.

m4_prob <- m2_samples %>% 
  mutate(W = rbinom(n(), 9, p)) %>% 
  group_by(W) %>% 
  tally() %>% 
  mutate(probability = n / sum(n))
Solution to exercise 3M4
Solution to exercise 3M4

Step Prior

Now we repeat the same steps but with the step prior instead of the uniform prior. We’ll just repeat it without comment.

m5_grid <- m1_grid %>% 
  mutate(prior = if_else(p < 0.5, 0, 1))

m5_posterior <- m5_grid %>% 
  mutate(
    likelihood = dbinom(W, N, p),
    posterior = prior * likelihood
  )
Solution to exercise 3M5 part 1
Solution to exercise 3M5 part 1
m5_samples <- m5_posterior %>% 
  sample_n(10000, replace = T, weight = posterior)

m5_hpdi <- HPDI(m5_samples$p, prob = 0.9)
m5_hpdi
     |0.9      0.9| 
0.5005005 0.7107107 
Solution to exercise 3M5 part 2
Solution to exercise 3M5 part 2
m5_prob <- m5_samples %>% 
  mutate(W = rbinom(n(), 15, p)) %>% 
  group_by(W) %>% 
  tally() %>% 
  mutate(probability = n / sum(n))
Solution to exercise 3M5 part 3
Solution to exercise 3M5 part 3
m5_prob <- m5_samples %>% 
  mutate(W = rbinom(n(), 9, p)) %>% 
  group_by(W) %>% 
  tally() %>% 
  mutate(probability = n / sum(n))
Solution to exercise 3M5 part 4
Solution to exercise 3M5 part 4

Let’s compare the proportion of samples within 0.05 of the true value for each prior.

p_close_uniform <- m2_samples %>% 
  group_by(close = p %>% between(p_true - 0.05, p_true + 0.05)) %>% 
  tally() %>% 
  mutate(probability = n / sum(n)) %>% 
  filter(close) %>% 
  pull(probability)

p_close_step <- m5_samples %>% 
  group_by(close = p %>% between(p_true - 0.05, p_true + 0.05)) %>% 
  tally() %>% 
  mutate(probability = n / sum(n)) %>% 
  filter(close) %>% 
  pull(probability)

The probability of being close to the true value under the uniform and step priors is 0.1316 and 0.2157, respectively. The step prior thus has more mass around the true value.

Exercise 3M6

Bayesian models are generative, meaning we can simulate new datasets according to our prior probabilities. We’ll simulate 10 datasets for each value of N of interest. We simulate a dataset by randomly choosing a p_true from our uniform prior, then randomly choosing a W from the corresponding binomial distribution.

m6_prior_predictive <- crossing(
    N = 200 * (1:16), 
    iter = 1:10
  ) %>% 
  mutate(
    p_true = runif(n(), min=0, max=1), 
    W = rbinom(n(), N, p_true)
  )

For each of these simulated datasets, we grid approximate the posterior, take posterior samples, then calculate the HPDI.

m6_grid <- tibble(p = seq(0, 1, length.out = granularity)) %>% 
  mutate(prior = 1)

m6_posteriors <- m6_prior_predictive %>% 
  crossing(m6_grid) %>% 
  group_by(N, p_true, iter) %>% 
  mutate(
    likelihood = dbinom(W, N, p),
    posterior = prior * likelihood
  )

m6_samples <- m6_posteriors %>% 
  sample_n(1000, replace = TRUE, weight = posterior) 

m6_hpdi <- m6_samples %>% 
  summarise(lo = HPDI(p, 0.99)[1], hi = HPDI(p, 0.99)[2]) %>% 
  mutate(width = abs(hi - lo))

Now for each value of N, we check how many of the intervals have the desired width.

m6_n <- m6_hpdi %>% 
  group_by(N) %>% 
  summarise(fraction = mean(width < 0.05)) 
Solution to exercise 3M6
Solution to exercise 3M6

Thus we expect a sample size around 2600-3000 to give us a sufficiently precise posterior estimation.

To leave a comment for the author, please follow the link and comment on their blog: Brian Callander.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)