Significant P-Values and Overlapping Confidence Intervals

Posted on March 25, 2013 by Wesley in Uncategorized | 0 Comments

[This article was first published on Statistical Research » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

There are all sorts of problems with p-values and confidence intervals and I have no intention (or the time) to cover all those problems right now. However, a big problem is that most people have no idea what p-values really mean. Here is one example of a common problem with p-values and how it relates to confidence intervals. The problem arises when there are two random variables (from independent populations), each with a mean and variance. A confidence interval can be constructed around each sample mean. Using these confidence intervals might be a tempting way to explain whether two values are statistically different. The issue is that a person may see that the confidence intervals overlap and therefore declare that there is no difference. Simply put this is not one of those iff (if and only if) situations. If the confidence intervals do not overlap then one can conclude that there is a statistical difference between means. However, the opposite can not be concluded. When the confidence intervals do overlap then there still may be a difference.

Here are simulated data from two independent normally distributed populations testing the confidence intervals and the p-values. It can easily be seen that this is a fairly frequent event. So don’t make the mistake and make conclusions solely on confidence intervals.


###
#Set some constants
###
alpha = .05
m = 15
n = 15
nsim = 20000

###
#Function to calculate the t statistic. Same as t.test(x,y, var.eqal=T)
# `spooled` can easiliy be modified.
###
tstatfunc=function(x,y){
m = length(x)
n = length(y)
spooled = sqrt(((m-1)*sd(x)^2+(n-1)*sd(y)^2)/(m+n-2))
tstat = (mean(x)-mean(y))/(spooled*sqrt(1/m+1/n))

return(tstat)
}

calcInterval = function(){
x = rnorm(m,0,1)
y = rnorm(n,1,.8)
se.x = sd(x)/sqrt(m)
se.y = sd(y)/sqrt(n)

t.stat = tstatfunc(x,y)

p.val = (1-pt(abs(t.stat),m+n-2))*2 #Pooled Variance, two sided hypothesis
ci.x.ll = mean(x)-abs(qt(alpha/2,m-1))*se.x
ci.x.ul = mean(x)+abs(qt(alpha/2,m-1))*se.x
ci.y.ll = mean(y)-abs(qt(alpha/2,n-1))*se.y
ci.y.ul = mean(y)+abs(qt(alpha/2,n-1))*se.y
TTest = t.test(x,y, var.equal=TRUE) #Run the t.test() function for comparison
ret.val = c(p=p.val, t.p=TTest$p.value,
ci.x.ll=ci.x.ll, ci.x.ul=ci.x.ul,
ci.y.ll=ci.y.ll, ci.y.ul=ci.y.ul)

return(ret.val)
}

###
#Replicate a few times
###
my.sims = as.data.frame( t(replicate(nsim,calcInterval())) )

###
#Do the intervals overlap and it's significant?
###
ci.vals = cbind(my.sims$ci.x.ll - my.sims$ci.y.ul, my.sims$ci.x.ul-my.sims$ci.y.ul, my.sims$ci.x.ul-my.sims$ci.y.ll)
overlapTest = (ci.vals[,1] > 0 & ci.vals[,2] > 0 & ci.vals[,3] > 0) |
(ci.vals[,1] < 0 & ci.vals[,2] < 0 & ci.vals[,3] < 0)
my.sims = cbind(my.sims,ci.vals,NotOverlap=overlapTest)

sum(my.sims$CI.p)/nsim

hist(
rbeta(100000,sum(my.sims$CI.p)+1, nsim-sum(my.sims$CI.p)+1) ##CI.p is a binomial distribution.
, nclass=100, xlab="Proportion", freq=F, main=expression("Histogram of Overlapping Confidence Intervals and p<"*alpha*" from Beta Distribution"))
###
#What percent have overlapping CI and p < alpha? -- "Significant" but yet CI indicate otherwise.
#Multiple comparisons
###
my.sims$CI.p = as.numeric( !my.sims$NotOverlap & my.sims$p < alpha )
my.sims$CI.p2 = as.numeric( my.sims$NotOverlap & my.sims$p > alpha )
my.sims$my.diff = my.sims$p-my.sims$t.p # Check calculations for consistency

sum(my.sims$CI.p)/length(my.sims$CI.p)

###
#How many have confidence intervals that do not overlap but yet are still "Significant"?
###
my.sims$CI.p2 = as.numeric( my.sims$NotOverlap & my.sims$p > alpha )
sum(my.sims$CI.p2)

###
#Histograms of the intervals
###
hist(my.sims$t.p, nclass=100)
subset(my.sims,my.sims$p > .5)

###
#Histograms of the intervals
###
hist(my.sims$ci.y.ul, nclass=100, xlim=c(-2,3), ylim=c(0,2), col=4, freq=F,
main="Histogram of Confidence Intervals", xlab="Value")
hist(my.sims$ci.y.ll, nclass=100, add=T, col=4, freq=F)
hist(my.sims$ci.x.ul, nclass=100, add=T, col=2, freq=F)
hist(my.sims$ci.x.ll, nclass=100, add=T, col=2, freq=F)

To leave a comment for the author, please follow the link and comment on their blog: Statistical Research » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Significant P-Values and Overlapping Confidence Intervals

Related

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)