Model Selection in Bayesian Linear Regression

June 17, 2013
By

(This article was first published on Lindons Log » R, and kindly contributed to R-bloggers)

Previously I wrote about performing polynomial regression and also about calculating marginal likelihoods. The data in the former and the calculations of the latter will be used here to exemplify model selection. Consider data generated by

y_{i}=b_{1}x_{i}+b_{3}x_{i}^3 + \varepsilon,

and suppose we wish to fit a polynomial of degree 3 to the data. There are then 4 regression coefficients, namely, the intercept and the three coefficients of the power of x. This yields 2^4=16 models possible models for the data. Let b1=8 and b3=-0.5 so that the data looks like this:

Model Selection Data

Model Selection Data

R Code for Marginal Likelihood Calculation

The code to generate the data and calculate the log marginal likelihood for the different models can be found here.

rm(list=ls())
x=runif(200,-10,10)
a=c(18,0,-0.5,0)
Y=a[1]*x^1+a[2]*x^2+a[3]*x^3+a[4]
Y=Y+rnorm(length(Y),0,5)
plot(x,Y)

p=4
X=cbind(x,x^2,x^3,1)
tf <- c(TRUE, FALSE)
models <- expand.grid(replicate(p,tf,simplify=FALSE))
names(models) <- NULL
models=as.matrix(models)
models=models[-dim(models)[1],]


a_0=100
b_0=0.5
mu_0=rep(0,p)
lambda_0=diag(p)


lml <- function(model){
	n=length(Y)
	Y=as.matrix(Y)
	X=as.matrix(X[,model])
	mu_0=as.matrix(mu_0[model])
	lambda_0=as.matrix(lambda_0[model,model])
        XtX=t(X)%*%X
	lambda_n=lambda_0 + XtX
	BMLE=solve(XtX)%*%t(X)%*%Y
	mu_n=solve(lambda_n)%*%(t(X)%*%Y+lambda_0%*%mu_0)
        a_n = a_0 + 0.5*n
	b_n=b_0 + 0.5*(t(Y)%*%Y + t(mu_0)%*%lambda_0%*%mu_0 - t(mu_n)%*%lambda_n%*%mu_n)
	log_mar_lik  <-  -0.5*n*log(2*pi) + 0.5*log(det(lambda_0)) - 0.5*log(det(lambda_n)) + lgamma(a_n) - lgamma(a_0) + a_0*log(b_0) - a_n*log(b_n)
	return(log_mar_lik)
}



lml.all=apply(models,1,lml)
results=cbind(lml.all, models)
order=sort(results[,1],index=TRUE,decreasing=TRUE)
results[order$ix,]

Model Selection Results

The models are listed in order of descending log marginal likelihood below:

            lml x x^2 x^3 c 
 [1,] -1342.261 1   0   1 0
 [2,] -1344.800 1   0   1 1
 [3,] -1348.514 1   1   1 0
 [4,] -1350.761 1   1   1 1
 [5,] -2182.616 0   0   1 0
 [6,] -2185.247 0   0   1 1
 [7,] -2188.961 0   1   1 0
 [8,] -2191.223 0   1   1 1
 [9,] -2394.062 1   0   0 0
[10,] -2396.100 1   0   0 1
[11,] -2398.886 1   1   0 0
[12,] -2401.119 1   1   0 1
[13,] -2482.800 0   1   0 0
[14,] -2482.810 0   0   0 1
[15,] -2484.837 0   1   0 1

The model with the highest log marginal likelihood is the model which includes x and x-cubed only, for which the MLE of the regression coefficients are 18.0305424 and -0.4987607 for x and x-cubed respectively. Compare this to how the data was generated.

The post Model Selection in Bayesian Linear Regression appeared first on Lindons Log.

To leave a comment for the author, please follow the link and comment on their blog: Lindons Log » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Sponsors

Mango solutions



RStudio homepage



Zero Inflated Models and Generalized Linear Mixed Models with R

Quantide: statistical consulting and training

datasociety

http://www.eoda.de





ODSC

ODSC

CRC R books series





Six Sigma Online Training









Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)