Computing evidence
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
The book Random effects and latent variable model selection, edited by David Dunson in 2008 as a Springer Lecture Note. contains several chapters dealing with evidence approximation in mixed effect models. (Incidentally, I would be interested in the story behind the Lecture Note as I found no explanation in the backcover or in the preface. Some chapters but not all refer to a SAMSI workshop on model uncertainty…) The final chapter (similar to a corresponding paper in JCGS) contains in particular the interesting identity that the Bayes factor opposing model h to model h-1 can be approximated by (the average of the terms)
when
- is the model index,
- the ‘s are simulated from the posterior under model h,
- the model only considers the h-1 first components of ,
- the prior under model h-1 is the projection of the prior under model h. (Note that this marginalisation is not the projection used in Bayesian Core.)
I was first surprised when reading this result and wondered whether or not it was suffering either of the Savage-Dickey difficulty or of the harmonic mean instability, but this does not seem to be the case. The ratio is correctly defined thanks to the projection property and the ratio has no particular reason to suffer from infinite variance.
To check the practical performances of the method, I tried it on a simple linear model with known variance
comparing the model including the (normal+log-trend) simulated regressor with the model without the regressor. I used a g-prior
on the full model and its(marginal) projection
which (rather counter-intuitively) involves the regressor values. This modelling should give a Bayes factor equal to
but the difference between the simulated Bayes factor and the actual value is quite large in my simulations, both under the full
> DD
[1] 1.769114e-10
> B01
[1] 0.004805112
and the restricted
> DD
[1] 0.6218985
> B01
[1] 2.516249
models. Here is the R code:
n=155 x=.2*rnorm(n)+log(1:n) y=2+.8*x+rnorm(n) #posterior parameter under G-prior with G=n pom={n/{n+1}}*lm(y~x)$coefficients pov={n/{n+1}}*summary(lm(y~x))$cov.unscaled #sample from posterior library(mvtnorm) posim=rmvnorm(10^5,mean=pom,sigma=pov) #Bayes F. approximation BF=dnorm(y[1],mean=posim[,1],log=TRUE)- dnorm(y[1]-posim[,1]-posim[,2]*x[1],log=TRUE) for (i in 2:n) BF=BF+dnorm(y[i],mean=posim[,1],log=TRUE)- dnorm(y[i]-posim[,1]-posim[,2]*x[i],log=TRUE) DD=mean(exp(BF)) #exact Bayes factor prig={n+1}*pov[1,1] B01=exp(dmvnorm(y,sigma=diag(n)+prig*matrix(1,n,n),log=TRUE)- dmvnorm(y,sigma=diag(n)+n*cbind(rep(1,n),x)%*% summary(lm(y~x))$cov.unscaled%*%rbind(rep(1,n),x),log=TRUE))
I think one explanation for the discrepancy is that the a‘s derived from the full model posterior are quite different from a‘s that would be generated from the restricted model posterior. The more I think about the above approximation, the less convinced I am that it can apply in full generality because of this drawback that the projected posterior has nothing to do with the posterior associated with the projected prior. (Since, formally, the same unbiasedness result applies when comparing a full model with k variables with a submodel with none but the constant, it is clear that the approximation may be poor in non-orthogonal situations.)
Filed under: Books, R, Statistics Tagged: Bayesian model choice, evidence, harmonic mean estimator, latent variable, Lecture Notes in Statistics, MCMC, mixed effect models, path sampling, prior projection, simulation, unbiasedness
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.