**Stats raving mad » R**, and kindly contributed to R-bloggers)

Estimating a proportion at first looks elementary. Hail to aymptotics, right? Well, initially it might seem efficient to iuse the fact that . In other words the classical confidence interval relies on the inversion of Wald’s test.

A function to ease the computation is the following (not really needed!).

waldci<- function(x,n,level){ phat<-sum(x)/n results<-phat + c(-1,1)*qnorm(1-level/2)*sqrt(phat*(1-phat)/n) print(results) }

An exact confidence interval is the so-called Blake’s confidence interval

blakerci <- function(x,n,level,tolerance=1e-05){ lower = 0 upper = 1 if (x!=0){lower = qbeta((1-level)/2, x, n-x+1) while (acceptbin(x, n, lower + tolerance) < (1 - level)) lower = lower+tolerance } if (x!=n){upper = qbeta(1 - (1-level)/2, x+1, n-x) while (acceptbin(x, n, upper - tolerance) < (1 - level)) upper = upper-tolerance } c(lower,upper) } # Computes the Blaker exact ci (Canadian J. Stat 2000) # for a binomial success probability # for x successes out of n trials with # confidence coefficient = level; uses acceptbin function acceptbin = function(x, n, p){ #computes the Blaker acceptability of p when x is observed # and X is bin(n, p) p1 = 1 - pbinom(x - 1, n, p) p2 = pbinom(x, n, p) a1 = p1 + pbinom(qbinom(p1, n, p) - 1, n, p) a2 = p2 + 1 - pbinom(qbinom(1 - p2, n, p), n, p) return(min(a1,a2)) }

A comparison is easily made along the following lines of code

list.counts.bl=NA list.counts.wl=NA prob=c(0.05,0.1,.9,.97) for (j in 1:4){ p=prob[j] n=9 size=1 level=0.05 N=30000 counts.bl=0 counts.wl=0 for (i in 1:N) { tmp.sample=rbinom(n,size,p) x=sum(tmp.sample) if (blakerci(x,n,level,tolerance=1e-05)[1]<p && blakerci(x,n,level,tolerance=1e-05)[2]>p) counts.bl=counts.bl+1 if (waldci(x,n,level)[1]<p && waldci(x,n,level)[2]>p) counts.wl=counts.wl+1 } list.counts.bl[j]=counts.bl list.counts.wl[j]=counts.wl } list.counts.bl/N list.counts.wl/N

You can see the difference at extremes!

tt=matrix(data=c(list.counts.bl,list.counts.wl),nrow=2,ncol=4,byrow=TRUE)/N colnames(tt)=prob rownames(tt)=list("Blaker","Wald") > round(tt,3) 0.05 0.1 0.9 0.97 Blaker 0.632 0.391 0.384 0.762 Wald 0.368 0.605 0.605 0.238

The blacerci function is adapted form A. Agresti’s site [link].

**Look it up…**

Lawrence D. Brown, T. Tony Cai and Anirban DasGupta, “Interval Estimation for a Binomial Proportion”, *Statistical Science* 2001, Vol. 16, No. 2, pp. 101–133 [pdf]

Laura A. Thompson, *An R (and S-PLUS) Manual to Accompany Agresti’s Categorical Data Analysis*, 2009 [pdf | R code]

**leave a comment**for the author, please follow the link and comment on his blog:

**Stats raving mad » R**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...