Approximate Bayesian model choice

March 16, 2014

(This article was first published on Xi'an's Og » R, and kindly contributed to R-bloggers)

The above is the running head of the arXived paper with full title “Implications of  uniformly distributed, empirically informed priors for phylogeographical model selection: A reply to Hickerson et al.” by Oaks, Linkem and Sukuraman. That I (again) read in the plane to Montréal (third one in this series!, and last because I also watched the Japanese psycho-thriller Midsummer’s Equation featuring a physicist turned detective in one of many TV episodes. I just found some common features with The Devotion of Suspect X, only to discover now that the book has been turned into another episode in the series.)

“Here we demonstrate that the approach of Hickerson et al. (2014) is dangerous in the sense that the empirically-derived priors often exclude from consideration the true values of the models’ parameters. On a more fundamental level, we question the value of adopting an empirical Bayesian stance for this model-choice problem, because it can mislead model posterior probabilities, which are inherently measures of belief in the models after prior knowledge is updated by the data.”

This paper actually is a reply to Hickerson et al. (2014, Evolution), which is itself a reply to an earlier paper by Oaks et al. (2013, Evolution). [Warning: I did not check those earlier references!] The authors object to the use of “narrow, empirically informed uniform priors” for the reason reproduced in the above quote. In connection with the msBayes of Huang et al. (2011, BMC Bioinformatics). The discussion is less about ABC used for model choice and posterior probabilities of models and more about the impact of vague priors, Oaks et al. (2013) arguing that this leads to a bias towards models with less parameters, a “statistical issue” in their words, while Hickerson et al. (2014) think this is due to msBayes way of selecting models and their parameters at random.

“…it is difficult to choose a uniformly distributed prior on divergence times that is broad enough to confidently contain the true values of parameters while being narrow enough to avoid spurious support of models with less parameter space.”

So quite an interesting debate that takes us in fine far away from the usual worries about ABC model choice! We are more at the level empirical versus natural Bayes, seen in the literature of the 80′s. (The meaning of empirical Bayes is not that clear in the early pages as the authors seem to involve any method using the data “twice”.) I actually do not remember reading papers about the formal properties of model choice done through classical empirical Bayes techniques. Except the special case of Aitkin’s (1991,2009) integrated likelihood. Which is essentially the analysis performed on the coin toy example (p.7)

“…models with more divergence parameters will be forced to integrate over much greater parameter space, all with equal prior density, and much of it with low likelihood.”

The above argument is an interesting rephrasing of Lindley’s paradox, which I cannot dispute, but of course it does not solve the fundamental issue of how to choose the prior away from vague uniform priors… I also like the quote “the estimated posterior probability of a model is a single value (rather than a distribution) lacking a measure of posterior uncertainty” as this is an issue on which we are currently working. I fully agree with the statement and we think an alternative assessment to posterior probabilities could be more appropriate for model selection in ABC settings (paper soon to come, hopefully!).

Filed under: Books, R, Statistics, Travel, University life Tagged: ABC, ABC model selection, 真夏方程式, Detective Galileo, empirical Bayes methods, integrated likelihood, Jeffreys-Lindley paradox, model posterior probabilities, Montréal, vague priors

To leave a comment for the author, please follow the link and comment on their blog: Xi'an's Og » R. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.


Mango solutions

RStudio homepage

Zero Inflated Models and Generalized Linear Mixed Models with R

Dommino data lab

Quantide: statistical consulting and training



CRC R books series

Six Sigma Online Training

Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)