Model weights for model choice

Posted on February 9, 2011 by xi'an in R bloggers, Uncategorized | 0 Comments

[This article was first published on Xi'an's Og » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

An ‘Og reader. Emmanuel Charpentier, sent me the following email about model choice:

I read with great interest your critique of Peter Congdon’s 2006 paper (CSDA, 50(2):346-357) proposing a method of estimation of posterior model probabilities based on improper distributions for parameters not present in the model inder examination, as well as a more general critique in your recent review of M. Aitkin’s recent book.

However, Peter Congdon’s 2007 proposal (Statistical Methodology. 4(2):143-157.) of another method for model weighting seems to have flown under your radar ; more generally, while the 2006 proposal seems to have been somewhat quoted and used in at least one biological application and two financial applications, ihis 2007 proposal seems to have been largely ignored (as far as a naïve Google Scholar’s user can tell) ; I found no allusion to this technique neither in your blog nor on Andrew Gelman’s blog.

This proposal, which uses a full probability model with proper priors and pseudo-priors, seems, however, to answer your critiques, and offers a number of technical advantages over other proposal :

it can be computed from separate MCMC samples, with no regard to the MCMC sapling technique used to obtain them, therefore allowing the use of the « canned expertise » existing in WinBUGS, OpenBUGS or JAGS (which entails the impossibility of controlling the exact sampling methods used to solve a given problem) ;

it avoids the needs of very long runs to sufficiently explore unlikely models (which is the curse of Carlin & Chib (1995) method) ;

it seems relatively easy to compute in most situations.

I’d be quite interested by any writings, thoughts or reactions to this proposal.

As I had indeed missed this paper, I went and took a look at it.

“Profiles can be obtained of differences in parameters or model fit measures between models.” (page 144)

Concerning the motivations of the approach given in the introduction, I am rather surprised at the goal of getting posterior distributions of differences like $\theta_1-\theta_2$ and $\mathbb{P}(\beta_1>\beta_2|Y)$ \beta_2|Y)’ title=’\mathbb{P}(\beta_1>\beta_2|Y)’ class=’latex’ /> when the indices correspond to two different models. Indeed, I have difficulties to have parameters of two models under comparison (hence in competition) to live together in the same event, like $\beta_!>\beta_2$ \beta_2′ title=’\beta_!>\beta_2′ class=’latex’ />. In my opinion, parameters from different models cannot coexist together, which is why I do not like the notion of a “common parameter” found in some Bayesian model choice analyses. (This was also one of my main criticisms of Murray’s book.)

Now, looking at the method itself, it boils down to an importance sampling solution, namely that $\mathbb{P}(\mathcal{M}=m|\mathbf{Y})$ is estimated by (page 147)

$\omega_m \propto \sum_{t=1}^T \dfrac{\mathbb{P}(\mathcal{M}=m) \mathbb{P}(\theta_m^{(t)}|\mathcal{M}=m)\mathbb{P}(\mathbf{Y}|\mathcal{M}=m,\theta_m^{(t)})}{g_m(\theta_m^{(t)})}$

where $g_m$ is the importance function approximating the true posterior for model m and where $\theta_m^{(t)}$ is one generation from this importance function. (I find the paper confusing at times as those simulations could also be understood as generated from the true posterior.) Obviously the quality of the approximation $g_m$ of the posterior density will impact the quality of the approximation of the posterior probability. My only technical objection to the paper is about the approximation of the “posterior” distributions of differences like $\theta_1-\theta_2$ or of “posterior” probabilities $P(\beta_1>\beta_2|Y)$ \beta_2|Y)’ title=’P(\beta_1>\beta_2|Y)’ class=’latex’ /> as given on pages 149-150, because the “joint posterior distribution” of $(\theta_1,\theta_2)$ should be

$\sum_{m\ne 1,2} \omega_m g_1(\theta_1)g_2(\theta_2) + \omega_1 \pi_1(\theta_1|\mathbf{Y})g_2(\theta_2) + \omega_2 g_1(\theta_1)\pi_2(\theta_2|\mathbf{Y})$

rather than $\pi_1(\theta_1|\mathbf{Y})\pi_2(\theta_2|\mathbf{Y})$ which seems to be implied in the paper (page 150). (But this may be an over-interpretation of mine…)