Model Scale Parameterization for MCMC Efficiency

Posted on August 1, 2013 by Lindons Log » R in R bloggers | 0 Comments

[This article was first published on Lindons Log » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I recently came across a very interesting paper by Y. Yu and X. Meng[1] who present an interweaving strategy between different model parameterizations to improve mixing. It is well known that different model parameterizations can perform better than others under certain conditions. Papaspiliopoulos, Roberts and Sköld [2] present a general framework for how to parameterize hierarchical models and provide insights into the conditions under which centered and non-centered parameterizations outperform each other. One isn’t, however, restricted to reperameterizations of location parameters only, as outlined in the aforementioned paper, and so I decided to experiment with reparameterizations of the scale parameter in a simple hierarchical model with improper priors on the parameters.

Centered Parameterization

Papaspiliopoulos gave a general definition of the centered parameterization to be when $Y_{i}$ is independent of $\lambda$ given $X_{i}$

$\displaystyle Y_{i}|X_{i},\sigma^{2} \sim N(X_{i},\sigma^{2}) \ \ \ \ \ (1)$

$\displaystyle X_{i}|\sigma^{2},\lambda^{2} \sim N(X_{i},\lambda^{2}\sigma^{2}) \ \ \ \ \ (2)$

$\displaystyle p( \lambda^{2} ) \propto \frac{1}{\lambda^{2}} \ \ \ \ \ (3)$

Full Conditionals

$\displaystyle \lambda^{2}|Y_{1:n},X_{1:n},\sigma^{2} \sim \Gamma^{-1}\left( \frac{n}{2}, \frac{\sum_{i}^{n} X_{i}^{2}}{2\sigma^{2}}\right) \ \ \ \ \ (4)$

$\displaystyle X_{i}|Y_{i},\sigma^{2},\lambda^{2} \sim N\left( \frac{\frac{Y_{i}}{\sigma^{2}}}{\frac{1}{\sigma^{2}}+\frac{1}{\lambda^{2}\sigma^{2}}}, \frac{1}{\frac{1}{\sigma^{2}}+\frac{1}{\lambda^{2}\sigma^{2}}} \right) \ \ \ \ \ (5)$

Non-Centered Parameterization

Papaspiliopoulos gave a general definition of the non-centered parameterization to be when $\tilde{X}_{i}$ and $\lambda$ are a priori independent.

$\displaystyle Y_{i}|\tilde{X}_{i},\sigma^{2},\lambda \sim N(\lambda \tilde{X}_{i},\sigma^{2}) \ \ \ \ \ (6)$

$\displaystyle \tilde{X}_{i}|\sigma^{2} \sim N(0,\sigma^{2}) \ \ \ \ \ (7)$

$\displaystyle p(\lambda) \propto 1 \ \ \ \ \ (8)$

Full Conditionals

$\displaystyle \lambda|Y_{1:n},X_{1:n},\sigma^{2} \sim N \left( \frac{\sum_{i=1}^{n}\tilde{X}_{i}Y_{i}}{\sum_{i=1}^{n}\tilde{X}_{i}^{2}}, \frac{\sigma^{2}}{\sum_{i=1}^{n}\tilde{X}_{i}^{2}} \right) \ \ \ \ \ (9)$

$\displaystyle \tilde{X}_{i}|Y_{i},\sigma^{2},\lambda^{2} \sim N\left( \frac{\frac{\lambda Y_{i}}{\sigma^{2}}}{\frac{\lambda^{2}}{\sigma^{2}}+\frac{1}{\sigma^{2}}}, \frac{1}{\frac{\lambda^{2}}{\sigma^{2}}+\frac{1}{\sigma^{2}}} \right) \ \ \ \ \ (10)$

Interweaving Strategy

Generally when the CP works well, the NCP works poorly and vice versa. Yaming Yu and Xiao-Li Meng[1] present a way of combining both strategies by interweaving the Gibbs steps of both parameterizations at each iteration. The details can be read in their paper. I decided to test all three Gibbs samplers with the following R code:

#Generate Data
lam2=0.5
lam=sqrt(lam2)
sig2=1
n=1000
Xt=rnorm(n,0,sqrt(lam2*sig2))
Y=rnorm(n,Xt,sqrt(sig2))
nmc=2000
X=Xt

#Centered Parameterization
cp_lam2=rep(0,nmc)
cp_X=matrix(0,nmc,n)
for(i in 1:nmc){
	inv_lam2=rgamma(1,(n)/2,rate=(t(X)%*%X)/(2*sig2))
	lam2=1/inv_lam2
	X=rnorm(n,(1/(1/sig2 + 1/(sig2*lam2)))*Y/sig2, sqrt(1/(1/sig2 + 1/(sig2*lam2))))
	cp_lam2[i]=lam2
	cp_X[i,]=X
}
mean_cp_X=apply(cp_X,2,mean)

#Non-Centered Parameterization
X=Xt
ncp_lam2=rep(0,nmc)
ncp_X=matrix(0,nmc,n)
for(i in 1:nmc){
	lam=rnorm(1,t(X)%*%Y/(t(X)%*%X), sqrt(sig2/(t(X)%*%X)))
	lam2=lam*lam;
	X=rnorm(n, (1/(1/sig2 + lam2/sig2))*lam*Y/sig2, sqrt(1/(1/sig2+lam2/sig2))  )
	ncp_lam2[i]=lam2
	ncp_X[i,]=X
}
mean_ncp_X=apply(ncp_X,2,mean)

#Interweaving Strategy
int_lam2=rep(0,nmc)
int_X=matrix(0,nmc,n)
for(i in 1:nmc){
	X=rnorm(n,(1/(1/sig2 + 1/(sig2*lam2)))*Y/sig2, sqrt(1/(1/sig2 + 1/(sig2*lam2))))
	inv_lam2=rgamma(1,(n)/2,rate=(t(X)%*%X)/(2*sig2))
	half_lam2=1/inv_lam2
	X=X/sqrt(half_lam2)    #Transform to Xtilde
	lam=rnorm(1,t(X)%*%Y/(t(X)%*%X), sqrt(sig2/(t(X)%*%X)))
	lam2=lam*lam;
	int_lam2[i]=lam2
	int_X[i,]=X
}
mean_cp_X=apply(cp_X,2,mean)

#Remove Burnin
cp_lam2=cp_lam2[-(1:1000)]
ncp_lam2=ncp_lam2[-(1:1000)]
int_lam2=int_lam2[-(1:1000)]

#Plot Results
par(mfrow=c(3,3))
acf(cp_lam2)
plot(cp_lam2,type="l")
plot(cp_lam2[1:nmc-1],cp_lam2[2:nmc])
acf(ncp_lam2)
plot(ncp_lam2,type="l")
plot(ncp_lam2[1:nmc-1],ncp_lam2[2:nmc])
acf(int_lam2)
plot(int_lam2,type="l")
plot(int_lam2[1:nmc-1],int_lam2[2:nmc])

Results

$\lambda=0.3$

Interweaving outperforms non-centered outperforms centered

$\lambda=6$

Interweaving outperforms centered outperforms non-centered

Discussion

As lambda gets small the centered parameterization becomes ever more autocorrelated and poorly mixing. When lambda becomes large the non-centered parameterization becomes ever more autocorrelated and poorly mixing. The interweaved Gibbs sampler exhibits great mixing in all cases.

[1]

Y. Yu and X. Meng, “To Center or Not to Center: That Is Not the Question–An Ancillarity-Sufficiency Interweaving Strategy (ASIS) for Boosting MCMC Efficiency,” Journal of Computational and Graphical Statistics, vol. 20, iss. 3, pp. 531-570, 2011.
[Bibtex]

@article{Yu11,
    author = {Yu, Yaming and Meng, Xiao-Li},
    citeulike-article-id = {10408757},
    citeulike-linkout-0 = {http://amstat.tandfonline.com/doi/abs/10.1198/jcgs.2011.203main},
    citeulike-linkout-1 = {http://pubs.amstat.org/doi/abs/10.1198/jcgs.2011.203main},
    citeulike-linkout-2 = {http://dx.doi.org/10.1198/jcgs.2011.203main},
    doi = {10.1198/jcgs.2011.203main},
    journal = {Journal of Computational and Graphical Statistics},
    number = {3},
    pages = {531--570},
    posted-at = {2012-03-03 18:10:07},
    priority = {2},
    title = {{To Center or Not to Center: That Is Not the Question--An Ancillarity-Sufficiency Interweaving Strategy (ASIS) for Boosting MCMC Efficiency}},
    url = {http://amstat.tandfonline.com/doi/abs/10.1198/jcgs.2011.203main},
    volume = {20},
    year = {2011}
}

[2] O. Papaspiliopoulos, G. O. Roberts, and M. Sköld, “A general framework for the parametrization of hierarchical models,” Statistical Science, vol. 22, iss. 1, pp. 59-73, 2007.
[Bibtex]

@article{Papaspiliopoulos07,
    abstract = {{In this paper, we describe centering and noncentering methodology as complementary techniques for use in parametrization of broad classes of hierarchical models, with a view to the construction of effective MCMC algorithms for exploring posterior distributions from these models. We give a clear qualitative understanding as to when centering and noncentering work well, and introduce theory concerning the convergence time complexity of Gibbs samplers using centered and noncentered parametrizations. We give general recipes for the construction of noncentered parametrizations, including an auxiliary variable technique called the state-space expansion technique. We also describe partially noncentered methods, and demonstrate their use in constructing robust Gibbs sampler algorithms whose convergence properties are not overly sensitive to the data.}},
    author = {Papaspiliopoulos, Omiros and Roberts, Gareth O. and Sk\"{o}ld, Martin},
    citeulike-article-id = {8977350},
    citeulike-linkout-0 = {http://www.jstor.org/stable/27645805},
    journal = {Statistical Science},
    number = {1},
    pages = {59--73},
    posted-at = {2011-03-10 18:55:50},
    priority = {2},
    publisher = {Institute of Mathematical Statistics},
    title = {{A general framework for the parametrization of hierarchical models}},
    url = {http://www.jstor.org/stable/27645805},
    volume = {22},
    year = {2007}
}

The post Model Scale Parameterization for MCMC Efficiency appeared first on Lindons Log.

To leave a comment for the author, please follow the link and comment on their blog: Lindons Log » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Model Scale Parameterization for MCMC Efficiency

Centered Parameterization

Full Conditionals

Non-Centered Parameterization

Full Conditionals

Interweaving Strategy

Results

$\lambda=0.3$

$\lambda=6$

Discussion

Related

Centered Parameterization

Full Conditionals

Non-Centered Parameterization

Full Conditionals

Interweaving Strategy

Results

<img src="http://s0.wp.com/latex.php?latex=%5Clambda%3D0.3&#038;bg=ffffff&#038;%23038;fg=000&#038;%23038;s=0" alt="\lambda=0.3" title="\lambda=0.3" class="latex" />

<img src="http://s0.wp.com/latex.php?latex=%5Clambda%3D6&#038;bg=ffffff&#038;%23038;fg=000&#038;%23038;s=0" alt="\lambda=6" title="\lambda=6" class="latex" />

Discussion

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

$\lambda=0.3$

$\lambda=6$

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)