How to tell when error bars correspond to a significant p-value

July 24, 2012

(This article was first published on R Psychologist » R, and kindly contributed to R-bloggers)


Belia, Fidler, Williams, and Cumming (2005) found that researchers in psychology, behavior neuroscience and medicine are really bad at interpreting when error bars signify that two means are significantly different (p = 0.05). What they did was to email a bunch of researchers and invite them to take a web-based test, and they got 473 usable responses. The test consisted of an interactive plot with error bars for two independent groups, the participants were asked to move the error bars to a position they believed would represent a significant t-test at p=0.05. They did this for error bars based on the 95 % CI and the group’s standard errors. The participants did on average set the 95 % CI too far apart with their mean placement corresponding to a p value of .009. They did the opposite with the SE error bars, which they put too close together yielding placements corresponding to p = 0.109. And if you’re wondering they found no difference between the three disciplines.


I wanted to pull my weight, and I have therefore created some various plots in R that show error bars that are significant at various p-values.

Interpreting error bars and confidence intervals p = .05

Figure 1. Error bars corresponding to a significant difference at p = .05 (equal group sizes and equal variances)

Interpreting error bars and confidence intervals p = .01

Figure 2. Error bars corresponding to a significant difference at p = .01 (equal group sizes and equal variances)

Interpreting error bars and confidence intervals p = .001

Figure 3. Error bars corresponding to a significant difference at p = .001 (equal group sizes and equal variances)

Based on the first plot we see that an overlap of about one third of the 95 % CIs corresponds to p = 0.05. For the SE error bars we see that they are about 1 SE apart when p = 0.05.

R Code

Here’s the complete R code used to produce these plots

m2 <- 100 # initital group size, should be the same as m1
p <- 1 # starting p-value
m1 <- 100 # mean group 1
sd1 <- 10 # sd group 1
sd2 <- 10 # sd group 2
n <- 20 # n per group
s <- sqrt(0.5 * (sd1^2 + sd2^2)) # pooled sd
while(p>0.05) { # loop til p = 0.05
  t <- (min(c(m1,m2)) - max(c(m1,m2))) / (s * sqrt(2/n)) # t statistics
  df <- (n*2)-2 # degress of freedom
  p <-pt(t, df)*2 # p value
  m2 <- m2 - (m2/10000) # adjust mean for group 2
get_CI <- function(x, sd, CI) { # calculate error bars
  se <- sd/sqrt(n) # standard error
  lwr <- c(x - qt((1 + CI)/2, n - 1) * se, x - se) # 95 % CI and SE lower limit
  upr <- c(x + qt((1 + CI)/2, n - 1) * se, x + se) # 95 % CI and SE upper limit
  data.frame("lwr" = lwr, "upr" = upr, "se" = se) # result
plot_df <- data.frame("mu" = rep(c(m1,m2), each=2)) # means
plot_df$group <- gl(2,2, labels=c("group1", "group2")) # group factor
plot_df$type <- gl(2,1,4, labels=c("95 % CI", "se errorbars")) # type of errorbar
plot_df <- cbind(plot_df, rbind(get_CI(m1, sd1, .95), get_CI(m2, sd2, .95))) # put it all together

get_overlap <- function(arg) { # calculate overlap %
  x <-subset(plot_df, type == arg) # subset for type of errorbar
  x_range <- abs(mean(x$lwr - x$upr)) # length of error bar
  x_lwr <- max(x$lwr) # lwr limit for group with highest lwr limit
  x_upr <- min(x$upr) # upr limit for group with lowest lwr limit
  overlap <- abs( (x_upr - x_lwr) / x_range) # % overlap
  data.frame("type"=arg, "range" = x_range, "lwr" = x_lwr, "upr" = x_upr, "overlap" = round(overlap, 2)) # result
overlap <-ldply(levels(plot_df$type), get_overlap) # get overlap and put into dataframe
overlap$text <- paste(overlap$overlap * 100, "% of errorbar") # label text
overlap$text_y <- c(overlap[1,4], overlap[2,3]) # quick-fix

ggplot(plot_df, aes(group, mu, group=group)) + 
  geom_point(size=3) + # point for group mean
  geom_errorbar(aes(ymax=upr, ymin=lwr), width=0.2) + # error bars for means
  opts(title=paste("Illustration of errorbars for a significant 2-sample t-test, p =", round(p,3))) +  # plot title
  facet_grid(. ~ type) + # split plot after error bar type
  geom_errorbar(data=overlap, aes(ymax=upr, ymin=lwr, x=1.5, y=NULL, group=type), width=0.1, color="red") + # add overlap error bar
  geom_text(data=overlap, aes(label = text, group=type, y=text_y, x=1.5, vjust=-1)) + # annotate overlap
  ylab(expression(bar(x))) # change y label

Belia S, Fidler F, Williams J, & Cumming G (2005). Researchers misunderstand confidence intervals and standard error bars. Psychological methods, 10 (4), 389-96 PMID: 16392994

To leave a comment for the author, please follow the link and comment on their blog: R Psychologist » R. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)