Confidence we seek…

Posted on November 18, 2009 by Manos Parzakonis in R bloggers | 0 Comments

[This article was first published on Stats raving mad » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Estimating a proportion at first looks elementary. Hail to aymptotics, right? Well, initially it might seem efficient to iuse the fact that $frac{hat{p}-p}{sqrt{frac{hat{p}(1-hat{p})}{n}}} tilde {} N(0,1)$ . In other words the classical confidence interval relies on the inversion of Wald’s test.

A function to ease the computation is the following (not really needed!).

waldci<- function(x,n,level){
phat<-sum(x)/n
results<-phat + c(-1,1)*qnorm(1-level/2)*sqrt(phat*(1-phat)/n)
print(results)
}

An exact confidence interval is the so-called Blake’s confidence interval

blakerci <- function(x,n,level,tolerance=1e-05){
  lower = 0
  upper = 1
  if (x!=0){lower = qbeta((1-level)/2, x, n-x+1)
    while (acceptbin(x, n, lower + tolerance) < (1 - level))
      lower = lower+tolerance
   }
  if (x!=n){upper = qbeta(1 - (1-level)/2, x+1, n-x)
    while (acceptbin(x, n, upper - tolerance) < (1 - level))
      upper = upper-tolerance
   }
 c(lower,upper)
}
# Computes the Blaker exact ci (Canadian J. Stat 2000)
# for a binomial success probability
# for x successes out of n trials with
# confidence coefficient = level; uses acceptbin function

acceptbin = function(x, n, p){
 #computes the Blaker acceptability of p when x is observed
 # and X is bin(n, p)
   p1 = 1 - pbinom(x - 1, n, p)
   p2 = pbinom(x, n, p)
   a1 = p1 + pbinom(qbinom(p1, n, p) - 1, n, p)
   a2 = p2 + 1 - pbinom(qbinom(1 - p2, n, p), n, p)
   return(min(a1,a2))
}

A comparison is easily made along the following lines of code

list.counts.bl=NA
list.counts.wl=NA
prob=c(0.05,0.1,.9,.97)

for (j in 1:4){
p=prob[j]
n=9
size=1
level=0.05
N=30000
counts.bl=0
counts.wl=0

for (i in 1:N)
{
tmp.sample=rbinom(n,size,p)
x=sum(tmp.sample)
if (blakerci(x,n,level,tolerance=1e-05)[1]<p && blakerci(x,n,level,tolerance=1e-05)[2]>p) counts.bl=counts.bl+1
if (waldci(x,n,level)[1]<p && waldci(x,n,level)[2]>p) counts.wl=counts.wl+1
}

list.counts.bl[j]=counts.bl
list.counts.wl[j]=counts.wl
}

list.counts.bl/N
list.counts.wl/N

You can see the difference at extremes!

tt=matrix(data=c(list.counts.bl,list.counts.wl),nrow=2,ncol=4,byrow=TRUE)/N
colnames(tt)=prob
rownames(tt)=list("Blaker","Wald")
> round(tt,3)
        0.05   0.1   0.9  0.97
Blaker 0.632 0.391 0.384 0.762
Wald   0.368 0.605 0.605 0.238

The blacerci function is adapted form A. Agresti’s site [link].

Look it up…

Lawrence D. Brown, T. Tony Cai and Anirban DasGupta, “Interval Estimation for a Binomial Proportion”, Statistical Science 2001, Vol. 16, No. 2, pp. 101–133 [pdf]

Laura A. Thompson, An R (and S-PLUS) Manual to Accompany Agresti’s Categorical Data Analysis, 2009 [pdf | R code]