# Bayesian linear regression analysis without tears (R)

November 17, 2013
By

[This article was first published on Statistical Reflections of a Medical Doctor » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Bayesian methods are sure to get some publicity after Vale Johnson’s PNAS paper regarding the use of Bayesian approaches to recalibrate p-value cutoffs from 0.05 to 0.005. Though the paper itself is bound to get some heat (see the discussion in Andrew Gelman’s blog and Matt Briggs’s fun-to-read deconstruction), the controversy might stimulate people to explore Bayesianism and (hopefully!) to move away from frequentist analyses. The newcomers though will face some hurdles in this journey:

• philosophical (the need to adapt to an “alternative” inferential lifestyle)
• practical (gather all the data that came before one’s definitive study, and process them mathematically in order define the priors)
• technical (learn the tools required to carry out Bayesian analyses and summarizes results)

Though there are excellent resources out there to deal with philosophy/theory (e.g. see the books by: Jaynes, Gelman, Robert, Lee) and the necessary tools to implement Bayesian analyses (in R, JAGS, OpenBUGS, WinBUGS, STAN) my own (admittedly biased) perspective is that many people will be reluctant to simultaneously change too many things in their scientific modus operandi. One can call it intellectual laziness, human inertia or simply lack of time, but the bottom line is that one is more likely to embrace change in small steps and with as little disturbance in one’s routine as possible. So how can one embark on the Bayesian journey by taking small steps towards the giant leap?

It would appear to me that one’s least resistance journey to Bayesianism might be based on non-informative (uninformative/ data-dominated) priors. These simultaneously avoid the need to do the tedious searching of previous evidence/expert elicitation required to provide informative priors, while retaining the connection to one’s frequentist past in which only current data are the only important things (hint: they are not). Furthermore, one can even avoid learning some of the more elaborate software systems/libraries required to carry out bona fide Bayesian analysis by  reusing of the R output of a frequentist analysis.

Let’s see how it is possible to cater to the needs of the lazy, inert or horribly busy researcher. First we start with the a toy linear regression example (straight from R’s lm help file):

## Annette Dobson (1990) "An Introduction to Generalized Linear Models".
## Page 9: Plant Weight Data.
ctl <- c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14)
trt <- c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69)
group <- gl(2, 10, 20, labels = c("Ctl","Trt"))
weight <- c(ctl, trt)
lmfit <- lm(weight ~ group)

The standard non-informative prior for the linear regression analysis example (Bayesian Data Analysis 2nd Ed, p:355-358) takes an improper (uniform) prior on the coefficients of the regression ( $\beta$ : the intercept and the effects of the “Trt” variable) and the logarithm of the residual variance $\sigma^2$. With these priors, the posterior distribution of $\beta$ conditional on $\sigma^2$ and the response variable $y$ is: $\beta|\sigma^2 , y \sim N(\hat{\beta},V_{\beta}\sigma^2)$

The marginal posterior distribution for $\sigma^2$ is a scaled inverse $\chi^2$ distribution with scale $s$ and $n-k$ degrees of freedom, where $n$ is the number of data points and $k$ the number of predictor variables. In our example these assume the values of $n=20, k=2$, while $s^2$ is the standard frequentist estimate of the residual variance.
The quantities $\hat{\beta}, s^2$ are directly available from the information returned by R’s lm, while $V_{\beta}$ can be computed from the qr element of the lm object:

    QR<-lmfit$qr df.residual<-lmfit$df.residual
R<-qr.R(QR) ## R component
coef<-lmfit$coef Vb<-chol2inv(R) ## variance(unscaled) s2<-(t(lmfit$residuals)%*%lmfit$residuals) s2<-s2[1,1]/df.residual To compute the marginal distribution of $\beta|y$ we can use a simple Monte Carlo algorithm, first drawing $\sigma^2$ from its marginal posterior, and then $\beta|\sigma^2 , y$. The following function will do that; it accepts as arguments a lm object, the desired number of Monte Carlo samples and returns everything in a data frame for further processing: ## function to compute the bayesian analog of the lmfit ## using non-informative priors and Monte Carlo scheme ## based on N samples bayesfit<-function(lmfit,N){ QR<-lmfit$qr
df.residual<-lmfit$df.residual R<-qr.R(QR) ## R component coef<-lmfit$coef
Vb<-chol2inv(R) ## variance(unscaled)
s2<-(t(lmfit$residuals)%*%lmfit$residuals)
s2<-s2[1,1]/df.residual

## now to sample residual variance
sigma<-df.residual*s2/rchisq(N,df.residual)
coef.sim<-sapply(sigma,function(x) mvrnorm(1,coef,Vb*x))
ret<-data.frame(t(coef.sim))
names(ret)<-names(lmfit$coef) ret$sigma<-sqrt(sigma)
ret
}

A helper function can be used to summarize these Monte Carlo estimates by yielding the mean, standard deviation, median, t (the ratio of mean/standard deviation) and a 95% (symmetric) credible interval:

Bayes.sum<-function(x)
{
c("mean"=mean(x),
"se"=sd(x),
"t"=mean(x)/sd(x),
"median"=median(x),
"CrI"=quantile(x,prob=0.025),
"CrI"=quantile(x,prob=0.975)
)
}

To use these functions and contrast Bayesian and frequentist estimates one simply needs to fit the regression model with lm, call the bayesim function to run the Bayesian analysis and pass the results to Bayes.sum:

> set.seed(1234)  ## reproducible sim
> lmfit <- lm(weight ~ group)
> bf<-bayesfit(lmfit,10000)
> t(apply(bf,2,Bayes.sum))
mean        se         t     median    Cr.2.5%  Cr.97.5%
(Intercept)  5.0332172 0.2336049 21.545857  5.0332222  4.5651327 5.4902380
groupTrt    -0.3720698 0.3335826 -1.115375 -0.3707408 -1.0385601 0.2895787
sigma        0.7262434 0.1275949  5.691789  0.7086832  0.5277051 1.0274083
> summary(lmfit)

Call:
lm(formula = weight ~ group)

Residuals:
Min      1Q  Median      3Q     Max
-1.0710 -0.4938  0.0685  0.2462  1.3690

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)   5.0320     0.2202  22.850 9.55e-15 ***
groupTrt     -0.3710     0.3114  -1.191    0.249
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.6964 on 18 degrees of freedom
Multiple R-squared:  0.07308,	Adjusted R-squared:  0.02158
F-statistic: 1.419 on 1 and 18 DF,  p-value: 0.249

It can be seen that the Bayesian estimates are almost identical to the frequentist ones (up to 2 significant digits, which is the limit of precision of the Monte Carlo run based on 10000 samples), but uncertainty in terms of these estimates (the standard deviation) and the residual variance is larger. This conservativeness is an inherent feature of Bayesian analysis which guards against too many false positives hits.  R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.