# Model Scale Parameterization for MCMC Efficiency

August 1, 2013
By

(This article was first published on Lindons Log » R, and kindly contributed to R-bloggers)

I recently came across a very interesting paper by Y. Yu and X. Meng[1] who present an interweaving strategy between different model parameterizations to improve mixing. It is well known that different model parameterizations can perform better than others under certain conditions. Papaspiliopoulos, Roberts and Sköld [2] present a general framework for how to parameterize hierarchical models and provide insights into the conditions under which centered and non-centered parameterizations outperform each other. One isn’t, however, restricted to reperameterizations of location parameters only, as outlined in the aforementioned paper, and so I decided to experiment with reparameterizations of the scale parameter in a simple hierarchical model with improper priors on the parameters.

## Centered Parameterization

Papaspiliopoulos gave a general definition of the centered parameterization to be when $Y_{i}$ is independent of $\lambda$ given $X_{i}$

$\displaystyle Y_{i}|X_{i},\sigma^{2} \sim N(X_{i},\sigma^{2}) \ \ \ \ \ (1)$

$\displaystyle X_{i}|\sigma^{2},\lambda^{2} \sim N(X_{i},\lambda^{2}\sigma^{2}) \ \ \ \ \ (2)$

$\displaystyle p( \lambda^{2} ) \propto \frac{1}{\lambda^{2}} \ \ \ \ \ (3)$

### Full Conditionals

$\displaystyle \lambda^{2}|Y_{1:n},X_{1:n},\sigma^{2} \sim \Gamma^{-1}\left( \frac{n}{2}, \frac{\sum_{i}^{n} X_{i}^{2}}{2\sigma^{2}}\right) \ \ \ \ \ (4)$

$\displaystyle X_{i}|Y_{i},\sigma^{2},\lambda^{2} \sim N\left( \frac{\frac{Y_{i}}{\sigma^{2}}}{\frac{1}{\sigma^{2}}+\frac{1}{\lambda^{2}\sigma^{2}}}, \frac{1}{\frac{1}{\sigma^{2}}+\frac{1}{\lambda^{2}\sigma^{2}}} \right) \ \ \ \ \ (5)$

## Non-Centered Parameterization

Papaspiliopoulos gave a general definition of the non-centered parameterization to be when $\tilde{X}_{i}$ and $\lambda$ are a priori independent.

$\displaystyle Y_{i}|\tilde{X}_{i},\sigma^{2},\lambda \sim N(\lambda \tilde{X}_{i},\sigma^{2}) \ \ \ \ \ (6)$

$\displaystyle \tilde{X}_{i}|\sigma^{2} \sim N(0,\sigma^{2}) \ \ \ \ \ (7)$

$\displaystyle p(\lambda) \propto 1 \ \ \ \ \ (8)$

### Full Conditionals

$\displaystyle \lambda|Y_{1:n},X_{1:n},\sigma^{2} \sim N \left( \frac{\sum_{i=1}^{n}\tilde{X}_{i}Y_{i}}{\sum_{i=1}^{n}\tilde{X}_{i}^{2}}, \frac{\sigma^{2}}{\sum_{i=1}^{n}\tilde{X}_{i}^{2}} \right) \ \ \ \ \ (9)$

$\displaystyle \tilde{X}_{i}|Y_{i},\sigma^{2},\lambda^{2} \sim N\left( \frac{\frac{\lambda Y_{i}}{\sigma^{2}}}{\frac{\lambda^{2}}{\sigma^{2}}+\frac{1}{\sigma^{2}}}, \frac{1}{\frac{\lambda^{2}}{\sigma^{2}}+\frac{1}{\sigma^{2}}} \right) \ \ \ \ \ (10)$

## Interweaving Strategy

Generally when the CP works well, the NCP works poorly and vice versa. Yaming Yu and Xiao-Li Meng[1] present a way of combining both strategies by interweaving the Gibbs steps of both parameterizations at each iteration. The details can be read in their paper. I decided to test all three Gibbs samplers with the following R code:

#Generate Data
lam2=0.5
lam=sqrt(lam2)
sig2=1
n=1000
Xt=rnorm(n,0,sqrt(lam2*sig2))
Y=rnorm(n,Xt,sqrt(sig2))
nmc=2000
X=Xt

#Centered Parameterization
cp_lam2=rep(0,nmc)
cp_X=matrix(0,nmc,n)
for(i in 1:nmc){
inv_lam2=rgamma(1,(n)/2,rate=(t(X)%*%X)/(2*sig2))
lam2=1/inv_lam2
X=rnorm(n,(1/(1/sig2 + 1/(sig2*lam2)))*Y/sig2, sqrt(1/(1/sig2 + 1/(sig2*lam2))))
cp_lam2[i]=lam2
cp_X[i,]=X
}
mean_cp_X=apply(cp_X,2,mean)

#Non-Centered Parameterization
X=Xt
ncp_lam2=rep(0,nmc)
ncp_X=matrix(0,nmc,n)
for(i in 1:nmc){
lam=rnorm(1,t(X)%*%Y/(t(X)%*%X), sqrt(sig2/(t(X)%*%X)))
lam2=lam*lam;
X=rnorm(n, (1/(1/sig2 + lam2/sig2))*lam*Y/sig2, sqrt(1/(1/sig2+lam2/sig2))  )
ncp_lam2[i]=lam2
ncp_X[i,]=X
}
mean_ncp_X=apply(ncp_X,2,mean)

#Interweaving Strategy
int_lam2=rep(0,nmc)
int_X=matrix(0,nmc,n)
for(i in 1:nmc){
X=rnorm(n,(1/(1/sig2 + 1/(sig2*lam2)))*Y/sig2, sqrt(1/(1/sig2 + 1/(sig2*lam2))))
inv_lam2=rgamma(1,(n)/2,rate=(t(X)%*%X)/(2*sig2))
half_lam2=1/inv_lam2
X=X/sqrt(half_lam2)    #Transform to Xtilde
lam=rnorm(1,t(X)%*%Y/(t(X)%*%X), sqrt(sig2/(t(X)%*%X)))
lam2=lam*lam;
int_lam2[i]=lam2
int_X[i,]=X
}
mean_cp_X=apply(cp_X,2,mean)

#Remove Burnin
cp_lam2=cp_lam2[-(1:1000)]
ncp_lam2=ncp_lam2[-(1:1000)]
int_lam2=int_lam2[-(1:1000)]

#Plot Results
par(mfrow=c(3,3))
acf(cp_lam2)
plot(cp_lam2,type="l")
plot(cp_lam2[1:nmc-1],cp_lam2[2:nmc])
acf(ncp_lam2)
plot(ncp_lam2,type="l")
plot(ncp_lam2[1:nmc-1],ncp_lam2[2:nmc])
acf(int_lam2)
plot(int_lam2,type="l")
plot(int_lam2[1:nmc-1],int_lam2[2:nmc])


## Results

### $\lambda=0.3$$\lambda=0.3$

Interweaving outperforms non-centered outperforms centered

### $\lambda=6$$\lambda=6$

Interweaving outperforms centered outperforms non-centered

## Discussion

As lambda gets small the centered parameterization becomes ever more autocorrelated and poorly mixing. When lambda becomes large the non-centered parameterization becomes ever more autocorrelated and poorly mixing. The interweaved Gibbs sampler exhibits great mixing in all cases.

[1] Y. Yu and X. Meng, “To Center or Not to Center: That Is Not the Question–An Ancillarity-Sufficiency Interweaving Strategy (ASIS) for Boosting MCMC Efficiency,” Journal of Computational and Graphical Statistics, vol. 20, iss. 3, pp. 531-570, 2011.
[Bibtex]
@article{Yu11,
author = {Yu, Yaming and Meng, Xiao-Li},
citeulike-article-id = {10408757},
doi = {10.1198/jcgs.2011.203main},
journal = {Journal of Computational and Graphical Statistics},
number = {3},
pages = {531--570},
posted-at = {2012-03-03 18:10:07},
priority = {2},
title = {{To Center or Not to Center: That Is Not the Question--An Ancillarity-Sufficiency Interweaving Strategy (ASIS) for Boosting MCMC Efficiency}},
url = {http://amstat.tandfonline.com/doi/abs/10.1198/jcgs.2011.203main},
volume = {20},
year = {2011}
}
[2] O. Papaspiliopoulos, G. O. Roberts, and M. Sköld, “A general framework for the parametrization of hierarchical models,” Statistical Science, vol. 22, iss. 1, pp. 59-73, 2007.
[Bibtex]
@article{Papaspiliopoulos07,
abstract = {{In this paper, we describe centering and noncentering methodology as complementary techniques for use in parametrization of broad classes of hierarchical models, with a view to the construction of effective MCMC algorithms for exploring posterior distributions from these models. We give a clear qualitative understanding as to when centering and noncentering work well, and introduce theory concerning the convergence time complexity of Gibbs samplers using centered and noncentered parametrizations. We give general recipes for the construction of noncentered parametrizations, including an auxiliary variable technique called the state-space expansion technique. We also describe partially noncentered methods, and demonstrate their use in constructing robust Gibbs sampler algorithms whose convergence properties are not overly sensitive to the data.}},
author = {Papaspiliopoulos, Omiros and Roberts, Gareth O. and Sk\"{o}ld, Martin},
citeulike-article-id = {8977350},
journal = {Statistical Science},
number = {1},
pages = {59--73},
posted-at = {2011-03-10 18:55:50},
priority = {2},
publisher = {Institute of Mathematical Statistics},
title = {{A general framework for the parametrization of hierarchical models}},
url = {http://www.jstor.org/stable/27645805},
volume = {22},
year = {2007}
}

The post Model Scale Parameterization for MCMC Efficiency appeared first on Lindons Log.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...