EM Algorithm for Bayesian Lasso R Cpp Code

September 5, 2014
By

[This article was first published on Lindons Log » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Bayesian Lasso

$$begin{align*}
p(Y_{o}|beta,phi)&=N(Y_{o}|1alpha+X_{o}beta,phi^{-1} I_{n{o}})\
pi(beta_{i}|phi,tau_{i}^{2})&=N(beta_{i}|0, phi^{-1}tau_{i}^{2})\
pi(tau_{i}^{2})&=Exp left( frac{lambda}{2} right)\
pi(phi)&propto phi^{-1}\
pi(alpha)&propto 1\
end{align*}$$

Marginalizing over (alpha) equates to centering the observations and losing a degree of freedom and working with the centered ( Y_{o} ).
Mixing over (tau_{i}^{2}) leads to a Laplace or Double Exponential prior on (beta_{i}) with rate parameter (sqrt{philambda}) as seen by considering the characteristic function
$$begin{align*}
varphi_{beta_{i}|phi}(t)&=int e^{jtbeta_{i}}pi(beta_{i}|phi)dbeta_{i}\
&=int int e^{jtbeta_{i}}pi(beta_{i}|phi,tau_{i}^{2})pi(tau_{i}^{2})dtau_{i} dbeta_{i}\
&=frac{lambda}{2} int e^{-frac{1}{2}frac{t^{2}}{phi}tau_{i}^{2}}e^{-frac{lambda}{2}tau_{i}^{2}}dtau_{i}\
&=frac{lambda}{frac{t^{2}}{phi}+lambda}=frac{lambdaphi}{t^{2}+lambdaphi}
end{align*}$$.

EM Algorithm

The objective is to find the mode of the joint posterior (pi(beta,phi|Y_{o})). It is easier, however, to find the joint mode of (pi(beta,phi|Y_{o},tau^{2})) and use EM to exploit the scale mixture representation.

$$begin{align*}
log pi(beta,phi|Y_{o},tau^{2})=c+ frac{n_o+p-3}{2}log phi -frac{phi}{2}||Y_{o}-X_{o}beta||^{2}-sum_{i=1}^{p}frac{phi}{2}frac{1}{tau_{i}^{2}}beta^{2}_{i}
end{align*}$$

Expectation

The expecation w.r.t. (tau_{i}^{2}) is handled as by
$$
begin{align*}
&frac{lambda}{2}int frac{1}{tau_{i}^{2}}left( frac{phi}{2pitau_{i}^{2}} right)^{frac{1}{2}}e^{-frac{1}{2}phibeta_{i}^{2}frac{1}{tau_{i}^{2}}}e^{-frac{lambda}{2}tau_{i}^{2}}dtau_{i}^{2}\
&frac{lambda}{2}int left( frac{phi}{2pi[tau_{i}^{2}]^{3}} right)^{frac{1}{2}}e^{-frac{1}{2}phibeta_{i}^{2}frac{1}{tau_{i}^{2}}}e^{-frac{lambda}{2}tau_{i}^{2}}dtau_{i}^{2}\
end{align*}$$

This has the kernel of an Inverse Gaussian distribution with shape parameter (phi beta_{i}^{2}) and mean (sqrt{frac{phi}{lambda}}|beta_{i}|)

$$begin{align*}
&frac{{lambda}}{2|beta_{i}|}int left( frac{beta_{i}^{2}phi}{2pi[tau_{i}^{2}]^{3}} right)^{frac{1}{2}}e^{-frac{1}{2}phibeta_{i}^{2}frac{1}{tau_{i}^{2}}}e^{-frac{lambda}{2}tau_{i}^{2}}dtau_{i}^{2}\
&frac{lambda}{2|beta_{i}|}e^{-sqrt{lambdaphibeta_{i}^{2}}}int left( frac{beta_{i}^{2}phi}{2pi[tau_{i}^{2}]^{3}} right)^{frac{1}{2}}e^{-frac{1}{2}phibeta_{i}^{2}frac{1}{tau_{i}^{2}}}e^{-frac{lambda}{2}tau_{i}^{2}}e^{sqrt{lambdaphibeta_{i}^{2}}}dtau_{i}^{2}\
&frac{lambda}{2|beta_{i}|}e^{-sqrt{lambdaphibeta_{i}^{2}}}\
end{align*}$$

Normalization as follows

$$begin{align*}
&frac{lambda}{2}int left( frac{phi}{2pitau_{i}^{2}} right)^{frac{1}{2}}e^{-frac{1}{2}phibeta_{i}^{2}frac{1}{tau_{i}^{2}}}e^{-frac{lambda}{2}tau_{i}^{2}}dtau_{i}^{2}\
&frac{lambda}{2}int tau_{i}^{2}left( frac{phi}{2pi[tau_{i}^{2}]^{3}} right)^{frac{1}{2}}e^{-frac{1}{2}phibeta_{i}^{2}frac{1}{tau_{i}^{2}}}e^{-frac{lambda}{2}tau_{i}^{2}}dtau_{i}^{2}\
end{align*}$$
$$begin{align*}
&frac{lambda}{2|beta_{i}|}e^{-sqrt{lambdaphibeta_{i}^{2}}}sqrt{frac{phi}{lambda}}|beta_{i}|\
end{align*}$$

( Rightarrow mathbb{E}left[ frac{1}{tau_{i}^{2}} right]=sqrt{frac{lambda}{phi^{t}}}frac{1}{|beta_{i}^{t}|}).
Let (Lambda^{t}=diag(sqrt{frac{lambda}{phi^{t}}}frac{1}{|beta_{1}^{t}|}, dots, sqrt{frac{lambda}{phi^{t}}}frac{1}{|beta_{p}^{t}|})).

Maximization

$$begin{align*}
&Q(beta,phi||beta^{t},phi^{t})=c+ frac{n_o+p-3}{2}log phi -frac{phi}{2}||Y_{o}-X_{o}beta||^{2} – frac{phi}{2}beta^{T}Lambda^{t}beta\
&=c+ frac{n_o+p-3}{2}log phi -frac{phi}{2}||beta-(X_{o}^{T}X_{o}+Lambda^{t})^{-1}X_{o}^{T}Y_{o}||^{2}_{(X_{o}^{T}X_{o}+Lambda^{t})}-frac{phi}{2}Y_{o}^{T}(I_{n_{o}}-X_{o}^{T}(X_{o}^{T}X_{o}+Lambda^{t})^{-1}X_{o})Y_{o}\
end{align*}$$

$$begin{align*}
beta^{t+1}&=(X_{o}^{T}X_{o}+Lambda^{t})^{-1}X_{o}^{T}Y_{o}\
end{align*}$$

$$begin{align*}
phi^{t+1}=frac{n_{o}+p-3}{Y_{o}^{T}(I_{n_{o}}-X_{o}^{T}(X_{o}^{T}X_{o}+Lambda^{t})^{-1}X_{o})Y_{o}}
end{align*}$$

RCpp C++ Code

#include 
// [[Rcpp::depends(RcppArmadillo)]]

using namespace Rcpp;
using namespace arma;

double or_log_posterior_density(int no, int p, double lasso, const Col& yo, const Mat& xo, const Col& B,double phi);

// [[Rcpp::export]]
List or_lasso_em(NumericVector ryo, NumericMatrix rxo, SEXP rlasso){

	//Define Variables//
	int p=rxo.ncol();
	int no=rxo.nrow();
	double lasso=Rcpp::as(rlasso);

	//Create Data//
	arma::mat xo(rxo.begin(), no, p, false);
	arma::colvec yo(ryo.begin(), ryo.size(), false);
	yo-=mean(yo);

	//Pre-Processing//
	Col xoyo=xo.t()*yo;
	Col B=xoyo/no;
	Col Babs=abs(B);
	Mat xoxo=xo.t()*xo;
	Mat D=eye(p,p);
	Mat Ip=eye(p,p);
	double yoyo=dot(yo,yo);
	double deltaB;
	double deltaphi;
	double phi=no/dot(yo-xo*B,yo-xo*B);
	double lp;

	//Create Trace Matrices
	Mat B_trace(p,20000);
	Col phi_trace(20000);
	Col lpd_trace(20000);

	//Run EM Algorithm//
	cout << "Beginning EM Algorithm" << endl;
	int t=0;
	B_trace.col(t)=B;
	phi_trace(t)=phi;
	lpd_trace(t)=or_log_posterior_density(no,p,lasso,yo,xo,B,phi);
	do{
		t=t+1;

		lp=sqrt(lasso/phi);

		Babs=abs(B);
		D=diagmat(sqrt(Babs));
		B=D*solve(D*xoxo*D+lp*Ip,D*xoyo);

		phi=(no+p-3)/(yoyo-dot(xoyo,B));

		//Store Values//
		B_trace.col(t)=B;
		phi_trace(t)=phi;
		lpd_trace(t)=or_log_posterior_density(no,p,lasso,yo,xo,B,phi);

		deltaB=dot(B_trace.col(t)-B_trace.col(t-1),B_trace.col(t)-B_trace.col(t-1));
		deltaphi=phi_trace(t)-phi_trace(t-1);
	} while((deltaB>0.00001 || deltaphi>0.00001) && t<19999);
	cout << "EM Algorithm Converged in " << t << " Iterations" << endl;

	//Resize Trace Matrices//
	B_trace.resize(p,t);
	phi_trace.resize(t);
	lpd_trace.resize(t);

	return Rcpp::List::create(
			Rcpp::Named("B") = B,
			Rcpp::Named("B_trace") = B_trace,
			Rcpp::Named("phi") = phi,
			Rcpp::Named("phi_trace") = phi_trace,
			Rcpp::Named("lpd_trace") = lpd_trace
			) ;

}

An Example in R

rm(list=ls())

#Generate Design Matrix
set.seed(3)
no=100
foo=rnorm(no,0,1)
sd=4
xo=cbind(foo+rnorm(no,0,sd),foo+rnorm(no,0,sd),foo+rnorm(no,0,sd),foo+rnorm(no,0,sd),foo+rnorm(no,0,sd),foo+rnorm(no,0,sd),foo+rnorm(no,0,sd),foo+rnorm(no,0,sd))
for(i in 1:40) xo=cbind(xo,foo+rnorm(no,0,sd),foo+rnorm(no,0,sd),foo+rnorm(no,0,sd),foo+rnorm(no,0,sd),foo+rnorm(no,0,sd),foo+rnorm(no,0,sd),foo+rnorm(no,0,sd))

#Scale and Center Design Matrix
xo=scale(xo,center=T,scale=F)
var=apply(xo^2,2,sum)
xo=scale(xo,center=F,scale=sqrt(var/no))

#Generate Data under True Model
p=dim(xo)[2]
b=rep(0,p)
b[1]=1
b[2]=2
b[3]=3
b[4]=4
b[5]=5
xo%*%b
yo=xo%*%b+rnorm(no,0,1)
yo=yo-mean(yo)

#Run Lasso
or_lasso=or_lasso_em(yo,xo,100)

#Posterior Density Increasing at Every Iteration?
or_lasso$lpd_trace[2:dim(or_lasso$lpd_trace)[1],1]-or_lasso$lpd_trace[1:(dim(or_lasso$lpd_trace)[1]-1),1]>=0
mean(or_lasso$lpd_trace[2:dim(or_lasso$lpd_trace)[1],1]-or_lasso$lpd_trace[1:(dim(or_lasso$lpd_trace)[1]-1),1]>=0)

#Plot Results
plot(or_lasso$B,ylab=expression(beta[lasso]),main="Lasso MAP Estimate of Regression Coefficients")

MAP regression coefficients

Park, T., & Casella, G. (2008). The Bayesian Lasso Journal of the American Statistical Association, 103 (482), 681-686 DOI: 10.1198/016214508000000337
Figueiredo M.A.T. (2003). Adaptive sparseness for supervised learning, IEEE Transactions on Pattern Analysis and Machine Intelligence, 25 (9) 1150-1159. DOI: http://dx.doi.org/10.1109/tpami.2003.1227989
Better Shrinkage Priors:
Armagan A., Dunson D.B. & Lee J. GENERALIZED DOUBLE PARETO SHRINKAGE., Statistica Sinica, PMID: 24478567

The post EM Algorithm for Bayesian Lasso R Cpp Code appeared first on Lindons Log.

To leave a comment for the author, please follow the link and comment on their blog: Lindons Log » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)