Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Tomorrow, for the final lecture of the Mathematical Statistics course, I will try to illustrate – using Monte Carlo simulations – the difference between classical statistics, and the Bayesien approach.

The (simple) way I see it is the following,

• for frequentists, a probability is a measure of the the frequency of repeated events, so the interpretation is that parameters are fixed (but unknown), and data are random
• for Bayesians, a probability is a measure of the degree of certainty about values, so the interpretation is that parameters are random and data are fixed

Or to quote Frequentism and Bayesianism: A Python-driven Primer,  a Bayesian statistician would say “given our observed data, there is a 95% probability that the true value of $\theta$ falls within the credible region” while a Frequentist statistician would say “there is a 95% probability that when I compute a confidence interval from data of this sort, the true value of $\theta$ will fall within it”.

To get more intuition about those quotes, consider a simple problem, with Bernoulli trials, with insurance claims. We want to derive some confidence interval for the probability to claim a loss. There were $n$ = 1047 policies. And 159 claims.

Consider the standard (frequentist) confidence interval. What does that mean that

$\overline{x}\pm 1.96 \sqrt{\frac{\overline{x}(1-\overline{x})}{n}}$

is the (asymptotic) 95% confidence interval? The way I see it is very simple. Let us generate some samples, of size $n$, with the same probability as the empirical one, i.e. $\widehat{\theta}$ (which is the meaning of “from data of this sort”). For each sample, compute the confidence interval with the relationship above. It is a 95% confidence interval because in 95% of the scenarios, the empirical value lies in the confidence interval. From a computation point of view, it is the following idea,

```> xbar <- 159
> n <- 1047
> ns <- 100
> M=matrix(rbinom(n*ns,size=1,prob=xbar/n),nrow=n)```

I generate 100 samples of size $n$. For each sample, I compute the mean, and the confidence interval, from the previous relationship

```> fIC=function(x) mean(x)+c(-1,1)*1.96*sqrt(mean(x)*(1-mean(x)))/sqrt(n)
> IC=t(apply(M,2,fIC))
> MN=apply(M,2,mean)```

Then we plot all those confidence intervals. In red when they do not contain the empirical mean

```> k=(xbar/nIC[,2])
> plot(MN,1:ns,xlim=range(IC),axes=FALSE,
+ xlab="",ylab="",pch=19,cex=.7,
+ col=c("blue","red")[1+k])
> axis(1)
> segments(IC[,1],1:ns,IC[,2],1:
+ ns,col=c("blue","red")[1+k])
> abline(v=xbar/n)```

Now, what about the Bayesian credible interval ? Assume that the prior distribution for the probability to claim a loss has a $\mathcal{B}(\alpha,\beta)$ distribution. We’ve seen in the course that, since the Beta distribution is the conjugate of the Bernoulli one, the posterior distribution will also be Beta. More precisely

$\mathcal{B}\left(\alpha+\sum x_i,\beta+n-\sum x_i\right)$

Based on that property, the confidence interval is based on quantiles of that (posterior) distribution

```> u=seq(.1,.2,length=501)
> v=dbeta(u,1+xbar,1+n-xbar)
> plot(u,v,axes=FALSE,type="l")
> I=u polygon(c(u[I],rev(u[I])),c(v[I],
+ rep(0,sum(I))),col="red",density=30,border=NA)
> I=u>qbeta(.975,1+xbar,1+n-xbar)
> polygon(c(u[I],rev(u[I])),c(v[I],
+ rep(0,sum(I))),col="red",density=30,border=NA)
> axis(1)```

What does that mean, here, that we have a 95% credible interval. Well, this time, we do not draw using the empirical mean, but some possible probability, based on that posterior distribution (given the observations)

`> pk <- rbeta(ns,1+xbar,1+n-xbar)`

In green, below, we can visualize the histogram of those values

```> hist(pk,prob=TRUE,col="light green",
+ border="white",axes=FALSE,
+ main="",xlab="",ylab="",lwd=3,xlim=c(.12,.18))```

And here again, let us generate samples, and compute the empirical probabilities,

```> M=matrix(rbinom(n*ns,size=1,prob=rep(pk,
+ each=n)),nrow=n)
> MN=apply(M,2,mean)```

Here, there is 95% chance that those empirical means lie in the credible interval, defined using quantiles of the posterior distribution. We can actually visualize all those means : in black the mean used to generate the sample, and then, in blue or red, the averages obtained on those simulated samples,

```> abline(v=qbeta(c(.025,.975),1+xbar,1+
+ n-xbar),col="red",lty=2)
> points(pk,seq(1,40,length=ns),pch=19,cex=.7)
> k=(MNqbeta(.975,1+xbar,1+n-xbar))
> points(MN,seq(1,40,length=ns),
+ pch=19,cex=.7,col=c("blue","red")[1+k])
> segments(MN,seq(1,40,length=ns),
+ pk,seq(1,40,length=ns),col="grey")```

More details and exemple on Bayesian statistics, seen with the eyes of a (probably) not Bayesian statistician in my slides, from my talk in London, last Summer,