an attempt at EP-ABC from scratch, nothing more… [except for a few bugs]

October 18, 2016
By

(This article was first published on R – Xi'an's Og, and kindly contributed to R-bloggers)

Following a request from one of the reviewers of our chapter Likelihood-free model choice, I tried to run EP-ABC on a toy problem and to compare it with the outcome of a random forest ABC. Literally starting from scratch, namely from the description found in Simon and Nicolas’ JASA paper.  To run my test, I chose as my modelled data an exponential Exp(λ) sample of size 20, with a Gaussian N(0,1) prior on the log parameter (here replicated 100 times):

n=20
trueda=matrix(rexp(100*n),ncol=n)
#prior values
er0=0
Q0=1

Then I applied the formulas found in the paper for approximating the evidence, first defining the log normalising constant for an unnormalised Gaussian density as on the top of p.318

Psi=function(er,Q){
  -.5*(log(Q/2/pi)-Q*er*er)}

[which exhibited a typo in the paper, as Simon Barthelmé figured out after emails exchanges that the right formulation was the inverse]

Psi=function(er,Q){
  -.5*(log(Q/2/pi)-er*er/Q)}

and iterating the site updates as described in Sections 2.2, 3.1 and Algorithms 1 and 3:

ep=function(dat,M=1e4,eps,Nep=10){
 n=length(dat)
 #global and ith natural parameters for Gaussian approximations
 #initialisation:
 Qi=rep(0,n)
 Q=Q0+sum(Qi)
 eri=rep(0,n)
 er=er0+sum(eri)
 #constants Ci in (2.6)
 normz=rep(1,n)
 for (t in 1:Nep){
  for (i in sample(1:n)){
  #site i update
   Qm=Q-Qi[i] #m for minus i
   erm=er-eri[i]
   #ABC sample
   thetas=rnorm(M,me=erm/Qm,sd=1/sqrt(Qm))
   dati=rexp(M,rate=exp(thetas))
#update by formula (3.3)
   Mh=sum((abs(dati-dat[i])1e3){
#enough proposals accepted
   Qi[i]=1/sig-Qm
   eri[i]=mu/sig-erm
   Q=Qm+Qi[i]
   er=erm+eri[i]
#update of Ci as in formula (2.7)
#with correction bottom of p.319
  normz[i]=log(Mh/M/2/eps)- Psi(er,Q)+Psi(erm,Qm)
  }}}
#approximation of the log evidence as on p.318
return(sum(normz)+Psi(er,Q)-Psi(er0,Q0)) }

except that I made an R beginner’s mistake (!) when calling the normal simulation to be

thetas=rnorm(M,me=erm/Qm)/sqrt(Qm)

to be compared with the genuine evidence under a conjugate Exp(1) prior [values of the evidence should differ but should not differ that much]

ev=function(dat){
 n=length(dat)
 lgamma(n+1)-(n+1)*log(1+sum(dat))
 }

After correcting for those bugs and too small sample sizes, thanks to Simon kind-hearted help!, running this code results in minor discrepancies between both quantities:

evid=ovid=rep(0,100)
M=1e4 #number of ABC samples
propep=.1 #tolerance factor
for (t in 1:100){
 #tolerance scaled by data
 epsi=propep*sd(trueda[t,])
 evid[t]=ep(trueda[t,],M,epsi)
 ovid[t]=ev(trueda[t,])
 }
plot(evid,ovid,cex=.6,pch=19)

as shown by the comparison below between the evidence and the EP-ABC approximations to the evidence, called epabence (the dashed line is the diagonal):

epabcObviously, this short (if lengthy at my scale) experiment is not intended to conclude that EP-ABC work or does not work. (Simon also provided additional comparison with the genuine evidence, that is under the right prior, and with the zero Monte-Carlo version of the EP-ABC evidence, achieving a high concentration over the diagonal.) I just conclude that the method does require a certain amount of calibration to become stable. Calibrations steps that are clearly spelled out in the paper (adaptive choice of M, stabilisation by quasi-random samples, stabilisation by stochastic optimisation steps and importance sampling), but also imply that EP-ABC is also “far from routine use because it make take days to tune on real problems” (pp.315-316). Thinking about the reasons for such discrepancy over the past days, one possible explanation I see for the impact of the calibration parameters is that the validation of EP if any occurs at a numerical rather than Monte Carlo level, hence that the Monte Carlo error must be really small to avoid interfering with the numerical aspects.

Filed under: R, Statistics, University life Tagged: ABC, computer experiment, expectation-propagation, Gaussian approximation, implementation, Monte Carlo Statistical Methods, R

To leave a comment for the author, please follow the link and comment on their blog: R – Xi'an's Og.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)