Computing evidence

Posted on November 28, 2010 by xi'an in R bloggers, Uncategorized | 0 Comments

[This article was first published on Xi'an's Og » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The book Random effects and latent variable model selection, edited by David Dunson in 2008 as a Springer Lecture Note. contains several chapters dealing with evidence approximation in mixed effect models. (Incidentally, I would be interested in the story behind the Lecture Note as I found no explanation in the backcover or in the preface. Some chapters but not all refer to a SAMSI workshop on model uncertainty…) The final chapter (similar to a corresponding paper in JCGS) contains in particular the interesting identity that the Bayes factor opposing model h to model h-1 can be approximated by (the average of the terms)

$dfrac{f(x|theta_{i,h},mathfrak{M}=h-1)}{f(x|theta_{i,h},mathfrak{M}=h)}$

when

$mathfrak{M}$ is the model index,
the $theta_{i,h}$ ‘s are simulated from the posterior under model h,
the model $mathfrak{M}=h-1$ only considers the h-1 first components of $theta_{i,h}$ ,
the prior under model h-1 is the projection of the prior under model h. (Note that this marginalisation is not the projection used in Bayesian Core.)

I was first surprised when reading this result and wondered whether or not it was suffering either of the Savage-Dickey difficulty or of the harmonic mean instability, but this does not seem to be the case. The ratio is correctly defined thanks to the projection property and the ratio has no particular reason to suffer from infinite variance.

To check the practical performances of the method, I tried it on a simple linear model with known variance

$y=a+bx+varepsilon$

comparing the model including the (normal+log-trend) simulated regressor with the model without the regressor. I used a g-prior

$beta=(a,b)simmathfrak{N}_2left(0,n[mathbf{X}^text{T}mathbf{X}]^{-1}right)$

on the full model and its(marginal) projection

$asimmathfrak{N}_1bigg(0,sigma^2=sum x_i^2/sum [x_i-bar x]^2bigg)$

which (rather counter-intuitively) involves the regressor values. This modelling should give a Bayes factor equal to

$dfrac{varphileft(mathbf{y}|0,mathbf{I}_n+sigma^2mathbf{1}mathbf{1}^text{T}right)}{varphileft(mathbf{y}|0,mathbf{I}_n+nmathbf{X}[mathbf{X}^text{T}mathbf{X}]^{-1}mathbf{X}^text{T}right)}$

but the difference between the simulated Bayes factor and the actual value is quite large in my simulations, both under the full

> DD
[1] 1.769114e-10
> B01
[1] 0.004805112

and the restricted

> DD
[1] 0.6218985
> B01
[1] 2.516249

models. Here is the R code:


n=155
 x=.2*rnorm(n)+log(1:n)
 y=2+.8*x+rnorm(n)

#posterior parameter under G-prior with G=n
 pom={n/{n+1}}*lm(y~x)$coefficients
 pov={n/{n+1}}*summary(lm(y~x))$cov.unscaled

#sample from posterior
 library(mvtnorm)
 posim=rmvnorm(10^5,mean=pom,sigma=pov)

#Bayes F. approximation
 BF=dnorm(y[1],mean=posim[,1],log=TRUE)-
    dnorm(y[1]-posim[,1]-posim[,2]*x[1],log=TRUE)
 for (i in 2:n)
 BF=BF+dnorm(y[i],mean=posim[,1],log=TRUE)-
       dnorm(y[i]-posim[,1]-posim[,2]*x[i],log=TRUE)
 DD=mean(exp(BF))

#exact Bayes factor
 prig={n+1}*pov[1,1]
 B01=exp(dmvnorm(y,sigma=diag(n)+prig*matrix(1,n,n),log=TRUE)-
 dmvnorm(y,sigma=diag(n)+n*cbind(rep(1,n),x)%*%
     summary(lm(y~x))$cov.unscaled%*%rbind(rep(1,n),x),log=TRUE))

I think one explanation for the discrepancy is that the a‘s derived from the full model posterior are quite different from a‘s that would be generated from the restricted model posterior. The more I think about the above approximation, the less convinced I am that it can apply in full generality because of this drawback that the projected posterior has nothing to do with the posterior associated with the projected prior. (Since, formally, the same unbiasedness result applies when comparing a full model with k variables with a submodel with none but the constant, it is clear that the approximation may be poor in non-orthogonal situations.)