Computing evidence

[This article was first published on Xi'an's Og » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The book Random effects and latent variable model selection, edited by David Dunson in 2008 as a Springer Lecture Note. contains several chapters dealing with evidence approximation in mixed effect models. (Incidentally, I would be interested in the story behind the  Lecture Note as I found no explanation in the backcover or in the preface. Some chapters but not all refer to a SAMSI workshop on model uncertainty…) The final chapter (similar to a corresponding paper in JCGS) contains in particular the interesting identity that the Bayes factor opposing model h to model h-1 can be approximated by (the average of the terms)

dfrac{f(x|theta_{i,h},mathfrak{M}=h-1)}{f(x|theta_{i,h},mathfrak{M}=h)}

when

  • mathfrak{M} is the model index,
  • the theta_{i,h}‘s are simulated from the posterior under model h,
  • the model mathfrak{M}=h-1 only considers the h-1 first components of theta_{i,h},
  • the prior under model h-1 is the projection of the prior under model h. (Note that this marginalisation is not the projection used in Bayesian Core.)

I was first surprised when reading this result and wondered whether or not it was suffering either of the Savage-Dickey difficulty or of the harmonic mean instability, but this does not seem to be the case. The ratio is correctly defined thanks to the projection property and the ratio has no particular reason to suffer from infinite variance.

To check the practical performances of the method, I tried it on a simple linear model with known variance

y=a+bx+varepsilon

comparing the model including the (normal+log-trend) simulated regressor with the model without the regressor. I used a g-prior

beta=(a,b)simmathfrak{N}_2left(0,n[mathbf{X}^text{T}mathbf{X}]^{-1}right)

on the full model and its(marginal)  projection

asimmathfrak{N}_1bigg(0,sigma^2=sum x_i^2/sum [x_i-bar x]^2bigg)

which (rather counter-intuitively) involves the regressor values. This modelling should give a Bayes factor equal to

  dfrac{varphileft(mathbf{y}|0,mathbf{I}_n+sigma^2mathbf{1}mathbf{1}^text{T}right)}{varphileft(mathbf{y}|0,mathbf{I}_n+nmathbf{X}[mathbf{X}^text{T}mathbf{X}]^{-1}mathbf{X}^text{T}right)}

but the difference between the simulated Bayes factor and the actual value is quite large in my simulations, both under the full

> DD
[1] 1.769114e-10
> B01
[1] 0.004805112

and the restricted

> DD
[1] 0.6218985
> B01
[1] 2.516249

models. Here is the R code:


n=155
 x=.2*rnorm(n)+log(1:n)
 y=2+.8*x+rnorm(n)

#posterior parameter under G-prior with G=n
 pom={n/{n+1}}*lm(y~x)$coefficients
 pov={n/{n+1}}*summary(lm(y~x))$cov.unscaled

#sample from posterior
 library(mvtnorm)
 posim=rmvnorm(10^5,mean=pom,sigma=pov)

#Bayes F. approximation
 BF=dnorm(y[1],mean=posim[,1],log=TRUE)-
    dnorm(y[1]-posim[,1]-posim[,2]*x[1],log=TRUE)
 for (i in 2:n)
 BF=BF+dnorm(y[i],mean=posim[,1],log=TRUE)-
       dnorm(y[i]-posim[,1]-posim[,2]*x[i],log=TRUE)
 DD=mean(exp(BF))

#exact Bayes factor
 prig={n+1}*pov[1,1]
 B01=exp(dmvnorm(y,sigma=diag(n)+prig*matrix(1,n,n),log=TRUE)-
 dmvnorm(y,sigma=diag(n)+n*cbind(rep(1,n),x)%*%
     summary(lm(y~x))$cov.unscaled%*%rbind(rep(1,n),x),log=TRUE))

I think one explanation for the discrepancy is that the a‘s derived from the full model posterior are quite different from a‘s that would be generated from the restricted model posterior. The more I think about the above approximation, the less convinced I am that it can apply in full generality because of this drawback that the projected posterior has nothing to do with the posterior associated with the projected prior. (Since, formally, the same unbiasedness result applies when comparing a full model with k variables with a submodel with none but the constant, it is clear that the approximation may be poor in non-orthogonal situations.)


Filed under: Books, R, Statistics Tagged: Bayesian model choice, evidence, harmonic mean estimator, latent variable, Lecture Notes in Statistics, MCMC, mixed effect models, path sampling, prior projection, simulation, unbiasedness

To leave a comment for the author, please follow the link and comment on their blog: Xi'an's Og » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)