Site icon R-bloggers

Clearer Understanding of 95% Confidence Interval Through The Lens of Simulation

[This article was first published on r on Everyday Is A School Day, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I’m now more confident in my understanding of the 95% confidence interval, but less certain about confidence intervals in general, knowing that we can’t be sure if our current interval includes the true population parameter. On a brighter note, if we have the correct confidence interval, it could still encompass the true parameter even when it’s not statistically significant. I find that quite refreshing

I always thought I knew what a confidence interval was until I revisited the topic. There are plenty of great resources out there covering the same material. However, nothing beats learning through trial and error with your own code and simulations. This may be a repetition of materials available on the web.

Objectives: < svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> < path d="M0 0h24v24H0z" fill="currentColor"> < path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z">

What Is Confidence Interval? < svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> < path d="M0 0h24v24H0z" fill="currentColor"> < path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z">

Per Wikipedia:

Informally, in frequentist statistics, a confidence interval (CI) is an interval which is expected to typically contain the parameter being estimated. More specifically, given a confidence level gamma (95% and 99% are typical values), a CI is a random interval which contains the parameter being estimated gamma % of the time. The confidence level, degree of confidence or confidence coefficient represents the long-run proportion of CIs (at the given confidence level) that theoretically contain the true value of the parameter.

What Does It Actually Mean? < svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> < path d="M0 0h24v24H0z" fill="currentColor"> < path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z">

When conducting an experiment, calculating a 95% confidence interval for the treatment effect doesn’t mean there’s a 95% chance that this specific interval contains the true effect. Instead, it means that if you were to repeat the experiment many times, approximately 95% of those confidence intervals would contain the true effect. The 95% confidence level indicates how often the method will produce intervals that capture the true parameter rather than the probability that any single interval captures it. This understanding is essential to accurately interpret a single confidence interval in your study.

It’s important to understand that there is no way to know whether your current confidence interval is part of the 95% that covers the true effect. This can be frustrating, but it’s a limitation of the method.

It is more intuitive to assume that the current confidence interval is one of those 95% that contain the true estimate and interpret it that way. Additionally, the 95% confidence interval coverage does not need to be “significant” to cover the true parameter; it inherently contains if the interval so happens to be one of those 95%.

If you’re still confused, don’t worry! Running simulations and visualizations can provide a clearer explanation. It’s worth noting that confidence intervals are estimated using different techniques, some more accurate than others, but we won’t be covering that here today.

Let The Simulation Begin < svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> < path d="M0 0h24v24H0z" fill="currentColor"> < path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z">

What If We Know The Truth Of The Population? < svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> < path d="M0 0h24v24H0z" fill="currentColor"> < path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z">

library(tidyverse)
library(kableExtra)
library(pwr)

# population parameters
n_pop <- 10^6
placebo_effect <- 0.2
treat_effect <- 0.5
true_y <- treat_effect - placebo_effect

# simulation 
set.seed(1)
placebo_pop <- rbinom(n_pop, 1, placebo_effect) 
treat_pop <- rbinom(n_pop, 1, treat_effect)

# population dataset
df_pop <- tibble(outcome_placebo=placebo_pop, outcome_treat=treat_pop) |>
  mutate(id = row_number())

Let’s set up a world where we know everything! Say, we know for sure whether a treatment works for certain people and won’t for others. Same for placebo. And also sometimes, both treatment and placebo work for certain people or nothing works. With this method, we constructed a world where we know the truth and simulation comes using sampling of this population.

The above code sets up such environment. Let’s run through what they mean.

Here the placebo and treatment effects are made up. You can simple change the numbers to create another world. Here you can practice large, moderate, small or no effect.

Let’s take a look what df_pop looks like

df_pop |>
  head(10) |>
  select(id, outcome_placebo, outcome_treat) |>
  kable()
id outcome_placebo outcome_treat
1 0 0
2 0 1
3 0 1
4 1 0
5 0 0
6 1 1
7 1 0
8 0 1
9 0 1
10 0 1

id is unique individual. outcome_placebo is the outcome when placebo is given. outcome_treat is outcome when treatment is given. 0 means not successful. 1 means successful. Notice how we have outcome for both placebo and treatment for each individual. Look at id 6 where outcome is successful regardless of treatment and placebo.

There you have it! Your own made up world of finite population where you know what works, what doesn’t. The beauty of this is that we can then sample from this known world where we know exactly what the treatment effect is (not an estimate), a fixed parameter. Hence, there is no reason to calculate confidence interval because it does not make sense to have one.

Let’s Simulate Multiple RCT < svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> < path d="M0 0h24v24H0z" fill="currentColor"> < path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z">

n_cal <- pwr.2p.test(h = ES.h(treat_effect,placebo_effect), power = 0.8, sig.level = 0.05)$n |> ceiling()

Assuming we want 80% power and alpha of 5%, and effect of 0.6435011 we need 38 per group.

df_full <- tibble(iter=numeric(),sample=numeric(),mean=numeric(),lower=numeric(),upper=numeric(),pval=numeric())

for (j in 1:12) {
  df <- tibble(iter=numeric(),sample=numeric(),mean=numeric(),lower=numeric(),upper=numeric(),pval=numeric())
  
  # set.seed(1)
  n <- n_cal*2
  
  for (i in 1:100) {
    df_sample <- df_pop |>
      slice_sample(n = n) |>
      rowwise() |>
      mutate(random_treatment = sample(0:1,1),
             outcome = case_when(
               random_treatment == 1 ~ outcome_treat,
               TRUE ~ outcome_placebo
             )) 
    
    treat <- df_sample |>
      filter(random_treatment == 1) |>
      pull(outcome)
    
    placebo <- df_sample |>
      filter(random_treatment == 0) |>
      pull(outcome)
    
    ci <- prop.test(x = c(sum(treat),sum(placebo)), n = c(length(treat),length(placebo)), correct = F)
    mean <- mean(treat) - mean(placebo)
    # lower <- mean - 1.96*sqrt(mean*(1-mean)/n) #wald, let's use wilson instead
    lower <- ci$conf.int[1]
    upper <- ci$conf.int[2]
    pvalue <- ci$p.value
    # upper <-  mean + 1.96*sqrt(mean*(1-mean)/n) #wald, let's use wilson instead
    df <- df |>
      add_row(tibble(iter=j,sample=i,mean=mean,lower=lower,upper=upper,pval=pvalue))
  }
  df_full <- df_full |>
    add_row(df)
  
}

Let’s break down the code above:

df_full |>
  head(10) |>
  kable()
iter sample mean lower upper pval
1 1 0.3473389 0.1492637 0.5454142 0.0018010
1 2 0.1448864 -0.0569022 0.3466749 0.1746284
1 3 0.2464986 0.0436915 0.4493057 0.0243074
1 4 0.3492723 0.1482620 0.5502827 0.0016048
1 5 0.1842105 -0.0229481 0.3913691 0.0874454
1 6 0.1843137 -0.0384418 0.4070693 0.0913694
1 7 0.3756614 0.1632469 0.5880759 0.0004565
1 8 0.4816355 0.2976731 0.6655978 0.0000116
1 9 0.2437276 0.0518956 0.4355596 0.0104277
1 10 0.1777778 -0.0259180 0.3814736 0.0959556

Let’s Visualize! < svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> < path d="M0 0h24v24H0z" fill="currentColor"> < path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z">

df_full |>
  mutate(true_found = case_when(
    lower < true_y & upper >  true_y ~ 1,
    TRUE ~ 0
  )) |>
  ggplot(aes(x=sample,y=mean,color=as.factor(true_found))) +
  geom_point(size=0.5) +
  geom_errorbar(aes(ymin=lower,ymax=upper), alpha=0.5) +
  geom_hline(yintercept = true_y) +
  geom_hline(yintercept = 0, color = "pink", alpha = 0.5) +
  # geom_ribbon(aes(ymin = -0.2, ymax = 0, xmin = 0, xmax = 101), fill = "pink", alpha = 0.3) +
  ylab("Average Treatment Effect") +
  xlab("Trials") +
  ggtitle(label = "Visualizing 95% Confidence Intervalssss", subtitle = "CI contains true estimate (torquoise), CI does not contain true estimate (red), \nfaceted by sets of trials") +
  theme_minimal() +
  theme(panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(),
        legend.position = "none") +
  facet_wrap(.~iter) 

Let’s see what is going on here:

This is quite fascinating! It is approximately true that ~95% (to be exact 93.75) of the confidence intervals contain the true parameter (treatment effect).

Also note that there are quite a few trials were not able to correctly reject the null hypothesis, 19.8333333% to be exact. Does that look familiar? It’s beta, isn’t it? If we flipped it around, the proportion of trials that correctly rejected the null hypothesis were 80.1666667%, which is essentially our power!

Final Thoughts/Lessons Learnt < svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> < path d="M0 0h24v24H0z" fill="currentColor"> < path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z">

df_full |>
  mutate(true_found = case_when(
    lower < true_y & upper >  true_y ~ 1,
    TRUE ~ 0
  )) |> 
  filter(true_found == 1, pval >= 0.05) 

Take a look at iter 1, sample 21. Even though the ATE estimate is off & failed to correctly reject the null hypotehsis, the CI still contains the true parameter (which is 0.3), which to me is quite fascinating!



If you like this article:

To leave a comment for the author, please follow the link and comment on their blog: r on Everyday Is A School Day.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Exit mobile version