Confidence vs. Credibility Intervals

[This article was first published on Freakonometrics » R-english, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Tomorrow, for the final lecture of the Mathematical Statistics course, I will try to illustrate – using Monte Carlo simulations – the difference between classical statistics, and the Bayesien approach.

The (simple) way I see it is the following,

  • for frequentists, a probability is a measure of the the frequency of repeated events, so the interpretation is that parameters are fixed (but unknown), and data are random
  • for Bayesians, a probability is a measure of the degree of certainty about values, so the interpretation is that parameters are random and data are fixed

Or to quote Frequentism and Bayesianism: A Python-driven Primer,  a Bayesian statistician would say “given our observed data, there is a 95% probability that the true value of http://latex.codecogs.com/gif.latex?theta falls within the credible region” while a Frequentist statistician would say “there is a 95% probability that when I compute a confidence interval from data of this sort, the true value of http://latex.codecogs.com/gif.latex?theta will fall within it”.

To get more intuition about those quotes, consider a simple problem, with Bernoulli trials, with insurance claims. We want to derive some confidence interval for the probability to claim a loss. There were http://latex.codecogs.com/gif.latex?n = 1047 policies. And 159 claims.

Consider the standard (frequentist) confidence interval. What does that mean that

http://latex.codecogs.com/gif.latex?overline{x}pm%201.96%20sqrt{frac{overline{x}(1-overline{x})}{n}}

is the (asymptotic) 95% confidence interval? The way I see it is very simple. Let us generate some samples, of size http://latex.codecogs.com/gif.latex?n, with the same probability as the empirical one, i.e. http://latex.codecogs.com/gif.latex?widehat{theta} (which is the meaning of “from data of this sort”). For each sample, compute the confidence interval with the relationship above. It is a 95% confidence interval because in 95% of the scenarios, the empirical value lies in the confidence interval. From a computation point of view, it is the following idea,

> xbar <- 159
> n <- 1047
> ns <- 100
> M=matrix(rbinom(n*ns,size=1,prob=xbar/n),nrow=n)

I generate 100 samples of size http://latex.codecogs.com/gif.latex?n. For each sample, I compute the mean, and the confidence interval, from the previous relationship

> fIC=function(x) mean(x)+c(-1,1)*1.96*sqrt(mean(x)*(1-mean(x)))/sqrt(n)
> IC=t(apply(M,2,fIC))
> MN=apply(M,2,mean)

Then we plot all those confidence intervals. In red when they do not contain the empirical mean

> k=(xbar/nIC[,2])
> plot(MN,1:ns,xlim=range(IC),axes=FALSE,
+ xlab="",ylab="",pch=19,cex=.7,
+ col=c("blue","red")[1+k])
> axis(1)
> segments(IC[,1],1:ns,IC[,2],1:
+ ns,col=c("blue","red")[1+k])
> abline(v=xbar/n)

Now, what about the Bayesian credible interval ? Assume that the prior distribution for the probability to claim a loss has a http://latex.codecogs.com/gif.latex?mathcal{B}(alpha,beta) distribution. We’ve seen in the course that, since the Beta distribution is the conjugate of the Bernoulli one, the posterior distribution will also be Beta. More precisely

http://latex.codecogs.com/gif.latex?mathcal{B}left(alpha+sum%20x_i,beta+n-sum%20x_iright)

Based on that property, the confidence interval is based on quantiles of that (posterior) distribution

> u=seq(.1,.2,length=501)
> v=dbeta(u,1+xbar,1+n-xbar)
> plot(u,v,axes=FALSE,type="l")
> I=u polygon(c(u[I],rev(u[I])),c(v[I],
+ rep(0,sum(I))),col="red",density=30,border=NA)
> I=u>qbeta(.975,1+xbar,1+n-xbar)
> polygon(c(u[I],rev(u[I])),c(v[I],
+ rep(0,sum(I))),col="red",density=30,border=NA)
> axis(1)

What does that mean, here, that we have a 95% credible interval. Well, this time, we do not draw using the empirical mean, but some possible probability, based on that posterior distribution (given the observations)

> pk <- rbeta(ns,1+xbar,1+n-xbar)

In green, below, we can visualize the histogram of those values

> hist(pk,prob=TRUE,col="light green",
+ border="white",axes=FALSE,
+ main="",xlab="",ylab="",lwd=3,xlim=c(.12,.18))

And here again, let us generate samples, and compute the empirical probabilities,

> M=matrix(rbinom(n*ns,size=1,prob=rep(pk,
+ each=n)),nrow=n)
> MN=apply(M,2,mean)

Here, there is 95% chance that those empirical means lie in the credible interval, defined using quantiles of the posterior distribution. We can actually visualize all those means : in black the mean used to generate the sample, and then, in blue or red, the averages obtained on those simulated samples,

> abline(v=qbeta(c(.025,.975),1+xbar,1+
+ n-xbar),col="red",lty=2)
> points(pk,seq(1,40,length=ns),pch=19,cex=.7)
> k=(MNqbeta(.975,1+xbar,1+n-xbar))
> points(MN,seq(1,40,length=ns),
+ pch=19,cex=.7,col=c("blue","red")[1+k])
> segments(MN,seq(1,40,length=ns),
+ pk,seq(1,40,length=ns),col="grey")

More details and exemple on Bayesian statistics, seen with the eyes of a (probably) not Bayesian statistician in my slides, from my talk in London, last Summer,

To leave a comment for the author, please follow the link and comment on their blog: Freakonometrics » R-english.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)