The Beta Prior, Likelihood, and Posterior

[This article was first published on Statistical Research » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The Beta distribution (and more generally the Dirichlet) are probably my favorite distributions.  However, sometimes only limited information is available when trying set up the distribution.  For example maybe you only know the lowest likely value, the highest likely value and the median, as a measure of center.  That information is sufficient to construct a basic form of the distribution.  The idea of this post is not to elaborate in detail on Bayesian priors and posteriors but to give a real working example of using a prior with limited knowledge about the distribution, adding some collected data and arriving at a posterior distribution along with a measure of its uncertainty.

The example below is a simple demonstration on how a prior distribution and current data can be combined and form a posterior distribution.  Earlier this year I gave a presentation at a conference where I modified this simple version of my code to be substantially more complex and I used the Dirichlet distribution to make national predictions based on statewide and local samples.   So this approach has some very useful applied statistical properties and can be modified to handle some very complex distributions.

Prior, Likelihood, and Posterior Distributions

A Note on Posterior Intervals

The posterior interval (also called a credible interval or credible region) provides a very intuitive way to describe the measure of uncertainty.  Unlike a confidence interval (discussed in one of my previous posts), a credible interval does in fact provide the probability that a value exists within the interval.  With this interval it is based on calculating the probability of different values given the data.  The graph above shows the different values (identified as theta) as well as the simulated posterior interval limits (alpha=.05 in black and .01 in light gray).  In other words the probability that theta is a member of the 95% credible interval is 0.95 (written as P\left( \theta \in CI \right) = 0.95).

 

The Prior and Posterior Distribution: An Example

The code to run the beta.select() function is found in the LearnBayes package.  This is a great function because by providing two quantiles one can determine the shape parameters of the Beta distribution.  This is useful to find the parameters (or a close approximation) of the prior distribution given only limited information.  If additional quantiles are known then they can be incorporated to better determine the shape parameters of the Beta distribution.


library(LearnBayes)
Q = data.frame(
quantile=c(
median=0.5,
maximum=0.99999,
minimum=0.00001),
prior=c(
median=0.85,
maximum=0.95,
minimum=0.60)
)

optimalBeta = function(Q) {
q1q = Q$quantile[1]
q1p = Q$prior[1]
q2q = Q$quantile[2]
q2p = Q$prior[2]
q3q = Q$quantile[3]
q3p = Q$prior[3]

# find the beta prior using quantile1 and quantile2
q.med = list(p=q1q, x=q1p)
q.max = list(p=q2q, x=q2p)
q.min = list(p=q3q, x=q3p)

# prior parameters using median and max, and median and min
prior.A = beta.select(q.med,q.max)
prior.B = beta.select(q.med,q.min)

prior.Aa = prior.A[1]
prior.Ab = prior.A[2]

prior.Ba = prior.B[1]
prior.Bb = prior.B[2]

## find the best possible beta prior
## Set a start and stop point range to find the best parameters
if (prior.Aa < prior.Ba) {
start.a = prior.Aa
stop.a = prior.Ba
} else {
start.a = prior.Ba
stop.a = prior.Aa
}

if (prior.Ab < prior.Bb) {
start.b = prior.Ab
stop.b = prior.Bb
} else {
start.b = prior.Bb
stop.b = prior.Ab
}
seq.a = seq(from=start.a, to=stop.a, length.out=1000)
seq.b = seq(from=start.b, to=stop.b, length.out=1000)

seq.grid = expand.grid(seq.a, seq.b)

prior.C.q1 = qbeta(q1q, seq.grid[,1], seq.grid[,2])
prior.C.q2 = qbeta(q2q, seq.grid[,1], seq.grid[,2])
prior.C.q3 = qbeta(q3q, seq.grid[,1], seq.grid[,2])

## Different distance measurements, manhattan, euclidean, or otherwise.
## It would be interesting to run a simulation to measure a variety of distance measurements.
prior.C.delta = abs(prior.C.q1 - q1p) + abs(prior.C.q2 - q2p) + abs(prior.C.q3 - q3p)
## prior.C.delta = sqrt( (prior.C.q1 - q1p)^2 + (prior.C.q2 - q2p)^2 + (prior.C.q3 - q3p)^2 )

optimize.seq = cbind(seq.grid, prior.C.q1, prior.C.q2, prior.C.q3, prior.C.delta)

## Minimize the delta, if the min-delta is not unique then choose the first occurence
best.a = optimize.seq[,1][ optimize.seq[,6]==min(optimize.seq[,6])][1]
best.b = optimize.seq[,2][ optimize.seq[,6]==min(optimize.seq[,6])][1]

return(list(a=best.a,b=best.b))
}

prior.dist = optimalBeta(Q)

##########################################################
## Take a look at only the prior
##########################################################
curve(dbeta(x,prior.dist$a,prior.dist$b)) # plot the prior
abline(v=Q$prior[1])

##########################################################
## Take a look at only the likelihood with given successes
##########################################################
calcLikelihood = function(successes, total){
curve(dbinom(successes,total,x)) # plot the likelihood
}

calcLikelihood(45, 50) ## e.g. 45/50 sucesses
## calculate some properties of the Beta distribution
calcBetaMode = function(aa, bb) {
beta.mode = (aa - 1)/(aa + bb - 2)
return(beta.mode)
}
calcBetaMean = function(aa, bb) {
beta.mean = (aa)/(aa + bb)
return(beta.mean)
}
calcBetaVar = function(aa, bb) {
beta.var = (aa * bb)/(((aa + bb)^2) * (aa + bb + 1))
return(beta.var)
}
calcBetaMedian = function(aa, bb) {
beta.med = (aa-1/3)/(aa+bb-2/3)
return(beta.med)
}
calcBetaSkew = function(aa, bb) {
beta.skew = ( 2*(bb-aa)*sqrt(aa+bb+1) ) /( (aa+bb+2)/sqrt(aa+bb) )
return(beta.skew)
}

##########################################################
## Take a look at the prior, likelihood, and posterior
##########################################################
priorToPosterior = function(successes, total, a, b) {
## Note the rule of succession
likelihood.a = successes + 1
likelihood.b = total - successes + 1

## Create posterior
posterior.a = a + successes;
posterior.b = b + total - successes
theta = seq(0.005, 0.995, length = 500)

## Calc density
prior = dbeta(theta, a, b)
likelihood = dbeta(theta, likelihood.a, likelihood.b)
posterior = dbeta(theta, posterior.a, posterior.b)

## Plot prior, likelihood, and posterior

## Can be used to scale down the graph if desired.
## However, the density is different for each prior, likelihood, posterior
m.orig = apply( cbind(prior, likelihood, posterior), 2, max)
m = max(c(prior, likelihood, posterior))

plot(theta, posterior, type = "l", ylab = "Density", lty = 2, lwd = 3,
main = paste("Prior: beta(", round(a,2), ",", round(b,2), "); Data: B(", total, ",", successes, "); ",
"Posterior: beta(", round(posterior.a,2), ",", round(posterior.b,2), ")", sep=""), ylim = c(0, m), col = 1)
lines(theta, likelihood, lty = 1, lwd = 3, col = 2)
lines(theta, prior, lty = 3, lwd = 3, col = 3)
legend("topleft",y=m, c("Prior", "Likelihood", "Posterior"), lty = c(3, 1, 2),
lwd = c(3, 3, 3), col = c(3, 2, 1))

prior.mode = calcBetaMode(a, b)
likelihood.mode = calcBetaMode(likelihood.a, likelihood.b)
posterior.mode = calcBetaMode(posterior.a, posterior.b)
prior.mean = calcBetaMean(a, b)
likelihood.mean = calcBetaMean(likelihood.a, likelihood.b)
posterior.mean = calcBetaMean(posterior.a, posterior.b)
prior.med = calcBetaMedian(a, b)
likelihood.med = calcBetaMedian(likelihood.a, likelihood.b)
posterior.med = calcBetaMedian(posterior.a, posterior.b)
prior.var = calcBetaVar(a, b)
likelihood.var = calcBetaVar(likelihood.a, likelihood.b)
posterior.var = calcBetaVar(posterior.a, posterior.b)
prior.skew = calcBetaSkew(a, b)
likelihood.skew = calcBetaSkew(likelihood.a, likelihood.b)
posterior.skew = calcBetaSkew(posterior.a, posterior.b)

print(paste("Mode: prior=",prior.mode,"; Likelihood=",likelihood.mode,"; Posterior=",posterior.mode))
print(paste("Mean: prior=",prior.mean,"; Likelihood=",likelihood.mean,"; Posterior=",posterior.mean))
print(paste("~Approx Median (for a and b > 1): prior=",prior.med,"; Likelihood=",likelihood.med,", for Posterior=",posterior.med))
print(paste("Var: prior=",prior.var,"; Likelihood=", likelihood.var,"; Posterior=",posterior.var))
print(paste("Skewness: prior=",prior.skew,"; Likelihood=",likelihood.skew,"; Posterior=",posterior.skew))
return(list(a=posterior.a,b=posterior.b))
}

posterior.out = priorToPosterior(25,50, prior.dist$a, prior.dist$b) # 25/50 is current data
beta.sim = rbeta(1000000,posterior.out$a, posterior.out$b)
abline(v=quantile(beta.sim, prob=c(.05/2, 1-.05/2)), col='#000000', lwd=2)
abline(v=quantile(beta.sim, prob=c(.01/2, 1-.01/2)), col='#EEEEEE', lwd=2)

 

To leave a comment for the author, please follow the link and comment on their blog: Statistical Research » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)