**Lindons Log » R**, and kindly contributed to R-bloggers)

Previously I wrote about performing polynomial regression and also about calculating marginal likelihoods. The data in the former and the calculations of the latter will be used here to exemplify model selection. Consider data generated by

and suppose we wish to fit a polynomial of degree 3 to the data. There are then 4 regression coefficients, namely, the intercept and the three coefficients of the power of x. This yields 2^4=16 models possible models for the data. Let b1=8 and b3=-0.5 so that the data looks like this:

## R Code for Marginal Likelihood Calculation

The code to generate the data and calculate the log marginal likelihood for the different models can be found here.

rm(list=ls()) x=runif(200,-10,10) a=c(18,0,-0.5,0) Y=a[1]*x^1+a[2]*x^2+a[3]*x^3+a[4] Y=Y+rnorm(length(Y),0,5) plot(x,Y) p=4 X=cbind(x,x^2,x^3,1) tf <- c(TRUE, FALSE) models <- expand.grid(replicate(p,tf,simplify=FALSE)) names(models) <- NULL models=as.matrix(models) models=models[-dim(models)[1],] a_0=100 b_0=0.5 mu_0=rep(0,p) lambda_0=diag(p) lml <- function(model){ n=length(Y) Y=as.matrix(Y) X=as.matrix(X[,model]) mu_0=as.matrix(mu_0[model]) lambda_0=as.matrix(lambda_0[model,model]) XtX=t(X)%*%X lambda_n=lambda_0 + XtX BMLE=solve(XtX)%*%t(X)%*%Y mu_n=solve(lambda_n)%*%(t(X)%*%Y+lambda_0%*%mu_0) a_n = a_0 + 0.5*n b_n=b_0 + 0.5*(t(Y)%*%Y + t(mu_0)%*%lambda_0%*%mu_0 - t(mu_n)%*%lambda_n%*%mu_n) log_mar_lik <- -0.5*n*log(2*pi) + 0.5*log(det(lambda_0)) - 0.5*log(det(lambda_n)) + lgamma(a_n) - lgamma(a_0) + a_0*log(b_0) - a_n*log(b_n) return(log_mar_lik) } lml.all=apply(models,1,lml) results=cbind(lml.all, models) order=sort(results[,1],index=TRUE,decreasing=TRUE) results[order$ix,]

## Model Selection Results

The models are listed in order of descending log marginal likelihood below:

lml x x^2 x^3 c [1,] -1342.261 1 0 1 0 [2,] -1344.800 1 0 1 1 [3,] -1348.514 1 1 1 0 [4,] -1350.761 1 1 1 1 [5,] -2182.616 0 0 1 0 [6,] -2185.247 0 0 1 1 [7,] -2188.961 0 1 1 0 [8,] -2191.223 0 1 1 1 [9,] -2394.062 1 0 0 0 [10,] -2396.100 1 0 0 1 [11,] -2398.886 1 1 0 0 [12,] -2401.119 1 1 0 1 [13,] -2482.800 0 1 0 0 [14,] -2482.810 0 0 0 1 [15,] -2484.837 0 1 0 1

The model with the highest log marginal likelihood is the model which includes x and x-cubed only, for which the MLE of the regression coefficients are 18.0305424 and -0.4987607 for x and x-cubed respectively. Compare this to how the data was generated.

The post Model Selection in Bayesian Linear Regression appeared first on Lindons Log.

**leave a comment**for the author, please follow the link and comment on their blog:

**Lindons Log » R**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...