EM Algorithm for Bayesian Lasso R Cpp Code

September 5, 2014
By

(This article was first published on Lindons Log » R, and kindly contributed to R-bloggers)

Bayesian Lasso

$$begin{align*}
p(Y_{o}|beta,phi)&=N(Y_{o}|1alpha+X_{o}beta,phi^{-1} I_{n{o}})\
pi(beta_{i}|phi,tau_{i}^{2})&=N(beta_{i}|0, phi^{-1}tau_{i}^{2})\
pi(tau_{i}^{2})&=Exp left( frac{lambda}{2} right)\
pi(phi)&propto phi^{-1}\
pi(alpha)&propto 1\
end{align*}$$

Marginalizing over (alpha) equates to centering the observations and losing a degree of freedom and working with the centered ( Y_{o} ).
Mixing over (tau_{i}^{2}) leads to a Laplace or Double Exponential prior on (beta_{i}) with rate parameter (sqrt{philambda}) as seen by considering the characteristic function
$$begin{align*}
varphi_{beta_{i}|phi}(t)&=int e^{jtbeta_{i}}pi(beta_{i}|phi)dbeta_{i}\
&=int int e^{jtbeta_{i}}pi(beta_{i}|phi,tau_{i}^{2})pi(tau_{i}^{2})dtau_{i} dbeta_{i}\
&=frac{lambda}{2} int e^{-frac{1}{2}frac{t^{2}}{phi}tau_{i}^{2}}e^{-frac{lambda}{2}tau_{i}^{2}}dtau_{i}\
&=frac{lambda}{frac{t^{2}}{phi}+lambda}=frac{lambdaphi}{t^{2}+lambdaphi}
end{align*}$$.

EM Algorithm

The objective is to find the mode of the joint posterior (pi(beta,phi|Y_{o})). It is easier, however, to find the joint mode of (pi(beta,phi|Y_{o},tau^{2})) and use EM to exploit the scale mixture representation.

$$begin{align*}
log pi(beta,phi|Y_{o},tau^{2})=c+ frac{n_o+p-3}{2}log phi -frac{phi}{2}||Y_{o}-X_{o}beta||^{2}-sum_{i=1}^{p}frac{phi}{2}frac{1}{tau_{i}^{2}}beta^{2}_{i}
end{align*}$$

Expectation

The expecation w.r.t. (tau_{i}^{2}) is handled as by
$$
begin{align*}
&frac{lambda}{2}int frac{1}{tau_{i}^{2}}left( frac{phi}{2pitau_{i}^{2}} right)^{frac{1}{2}}e^{-frac{1}{2}phibeta_{i}^{2}frac{1}{tau_{i}^{2}}}e^{-frac{lambda}{2}tau_{i}^{2}}dtau_{i}^{2}\
&frac{lambda}{2}int left( frac{phi}{2pi[tau_{i}^{2}]^{3}} right)^{frac{1}{2}}e^{-frac{1}{2}phibeta_{i}^{2}frac{1}{tau_{i}^{2}}}e^{-frac{lambda}{2}tau_{i}^{2}}dtau_{i}^{2}\
end{align*}$$

This has the kernel of an Inverse Gaussian distribution with shape parameter (phi beta_{i}^{2}) and mean (sqrt{frac{phi}{lambda}}|beta_{i}|)

$$begin{align*}
&frac{{lambda}}{2|beta_{i}|}int left( frac{beta_{i}^{2}phi}{2pi[tau_{i}^{2}]^{3}} right)^{frac{1}{2}}e^{-frac{1}{2}phibeta_{i}^{2}frac{1}{tau_{i}^{2}}}e^{-frac{lambda}{2}tau_{i}^{2}}dtau_{i}^{2}\
&frac{lambda}{2|beta_{i}|}e^{-sqrt{lambdaphibeta_{i}^{2}}}int left( frac{beta_{i}^{2}phi}{2pi[tau_{i}^{2}]^{3}} right)^{frac{1}{2}}e^{-frac{1}{2}phibeta_{i}^{2}frac{1}{tau_{i}^{2}}}e^{-frac{lambda}{2}tau_{i}^{2}}e^{sqrt{lambdaphibeta_{i}^{2}}}dtau_{i}^{2}\
&frac{lambda}{2|beta_{i}|}e^{-sqrt{lambdaphibeta_{i}^{2}}}\
end{align*}$$

Normalization as follows

$$begin{align*}
&frac{lambda}{2}int left( frac{phi}{2pitau_{i}^{2}} right)^{frac{1}{2}}e^{-frac{1}{2}phibeta_{i}^{2}frac{1}{tau_{i}^{2}}}e^{-frac{lambda}{2}tau_{i}^{2}}dtau_{i}^{2}\
&frac{lambda}{2}int tau_{i}^{2}left( frac{phi}{2pi[tau_{i}^{2}]^{3}} right)^{frac{1}{2}}e^{-frac{1}{2}phibeta_{i}^{2}frac{1}{tau_{i}^{2}}}e^{-frac{lambda}{2}tau_{i}^{2}}dtau_{i}^{2}\
end{align*}$$
$$begin{align*}
&frac{lambda}{2|beta_{i}|}e^{-sqrt{lambdaphibeta_{i}^{2}}}sqrt{frac{phi}{lambda}}|beta_{i}|\
end{align*}$$

( Rightarrow mathbb{E}left[ frac{1}{tau_{i}^{2}} right]=sqrt{frac{lambda}{phi^{t}}}frac{1}{|beta_{i}^{t}|}).
Let (Lambda^{t}=diag(sqrt{frac{lambda}{phi^{t}}}frac{1}{|beta_{1}^{t}|}, dots, sqrt{frac{lambda}{phi^{t}}}frac{1}{|beta_{p}^{t}|})).

Maximization

$$begin{align*}
&Q(beta,phi||beta^{t},phi^{t})=c+ frac{n_o+p-3}{2}log phi -frac{phi}{2}||Y_{o}-X_{o}beta||^{2} – frac{phi}{2}beta^{T}Lambda^{t}beta\
&=c+ frac{n_o+p-3}{2}log phi -frac{phi}{2}||beta-(X_{o}^{T}X_{o}+Lambda^{t})^{-1}X_{o}^{T}Y_{o}||^{2}_{(X_{o}^{T}X_{o}+Lambda^{t})}-frac{phi}{2}Y_{o}^{T}(I_{n_{o}}-X_{o}^{T}(X_{o}^{T}X_{o}+Lambda^{t})^{-1}X_{o})Y_{o}\
end{align*}$$

$$begin{align*}
beta^{t+1}&=(X_{o}^{T}X_{o}+Lambda^{t})^{-1}X_{o}^{T}Y_{o}\
end{align*}$$

$$begin{align*}
phi^{t+1}=frac{n_{o}+p-3}{Y_{o}^{T}(I_{n_{o}}-X_{o}^{T}(X_{o}^{T}X_{o}+Lambda^{t})^{-1}X_{o})Y_{o}}
end{align*}$$

RCpp C++ Code

#include 
// [[Rcpp::depends(RcppArmadillo)]]

using namespace Rcpp;
using namespace arma;

double or_log_posterior_density(int no, int p, double lasso, const Col& yo, const Mat& xo, const Col& B,double phi);

// [[Rcpp::export]]
List or_lasso_em(NumericVector ryo, NumericMatrix rxo, SEXP rlasso){

	//Define Variables//
	int p=rxo.ncol();
	int no=rxo.nrow();
	double lasso=Rcpp::as(rlasso);

	//Create Data//
	arma::mat xo(rxo.begin(), no, p, false);
	arma::colvec yo(ryo.begin(), ryo.size(), false);
	yo-=mean(yo);

	//Pre-Processing//
	Col xoyo=xo.t()*yo;
	Col B=xoyo/no;
	Col Babs=abs(B);
	Mat xoxo=xo.t()*xo;
	Mat D=eye(p,p);
	Mat Ip=eye(p,p);
	double yoyo=dot(yo,yo);
	double deltaB;
	double deltaphi;
	double phi=no/dot(yo-xo*B,yo-xo*B);
	double lp;

	//Create Trace Matrices
	Mat B_trace(p,20000);
	Col phi_trace(20000);
	Col lpd_trace(20000);

	//Run EM Algorithm//
	cout << "Beginning EM Algorithm" << endl;
	int t=0;
	B_trace.col(t)=B;
	phi_trace(t)=phi;
	lpd_trace(t)=or_log_posterior_density(no,p,lasso,yo,xo,B,phi);
	do{
		t=t+1;

		lp=sqrt(lasso/phi);

		Babs=abs(B);
		D=diagmat(sqrt(Babs));
		B=D*solve(D*xoxo*D+lp*Ip,D*xoyo);

		phi=(no+p-3)/(yoyo-dot(xoyo,B));

		//Store Values//
		B_trace.col(t)=B;
		phi_trace(t)=phi;
		lpd_trace(t)=or_log_posterior_density(no,p,lasso,yo,xo,B,phi);

		deltaB=dot(B_trace.col(t)-B_trace.col(t-1),B_trace.col(t)-B_trace.col(t-1));
		deltaphi=phi_trace(t)-phi_trace(t-1);
	} while((deltaB>0.00001 || deltaphi>0.00001) && t<19999);
	cout << "EM Algorithm Converged in " << t << " Iterations" << endl;

	//Resize Trace Matrices//
	B_trace.resize(p,t);
	phi_trace.resize(t);
	lpd_trace.resize(t);

	return Rcpp::List::create(
			Rcpp::Named("B") = B,
			Rcpp::Named("B_trace") = B_trace,
			Rcpp::Named("phi") = phi,
			Rcpp::Named("phi_trace") = phi_trace,
			Rcpp::Named("lpd_trace") = lpd_trace
			) ;

}

An Example in R

rm(list=ls())

#Generate Design Matrix
set.seed(3)
no=100
foo=rnorm(no,0,1)
sd=4
xo=cbind(foo+rnorm(no,0,sd),foo+rnorm(no,0,sd),foo+rnorm(no,0,sd),foo+rnorm(no,0,sd),foo+rnorm(no,0,sd),foo+rnorm(no,0,sd),foo+rnorm(no,0,sd),foo+rnorm(no,0,sd))
for(i in 1:40) xo=cbind(xo,foo+rnorm(no,0,sd),foo+rnorm(no,0,sd),foo+rnorm(no,0,sd),foo+rnorm(no,0,sd),foo+rnorm(no,0,sd),foo+rnorm(no,0,sd),foo+rnorm(no,0,sd))

#Scale and Center Design Matrix
xo=scale(xo,center=T,scale=F)
var=apply(xo^2,2,sum)
xo=scale(xo,center=F,scale=sqrt(var/no))

#Generate Data under True Model
p=dim(xo)[2]
b=rep(0,p)
b[1]=1
b[2]=2
b[3]=3
b[4]=4
b[5]=5
xo%*%b
yo=xo%*%b+rnorm(no,0,1)
yo=yo-mean(yo)

#Run Lasso
or_lasso=or_lasso_em(yo,xo,100)

#Posterior Density Increasing at Every Iteration?
or_lasso$lpd_trace[2:dim(or_lasso$lpd_trace)[1],1]-or_lasso$lpd_trace[1:(dim(or_lasso$lpd_trace)[1]-1),1]>=0
mean(or_lasso$lpd_trace[2:dim(or_lasso$lpd_trace)[1],1]-or_lasso$lpd_trace[1:(dim(or_lasso$lpd_trace)[1]-1),1]>=0)

#Plot Results
plot(or_lasso$B,ylab=expression(beta[lasso]),main="Lasso MAP Estimate of Regression Coefficients")

MAP regression coefficients

Park, T., & Casella, G. (2008). The Bayesian Lasso Journal of the American Statistical Association, 103 (482), 681-686 DOI: 10.1198/016214508000000337
Figueiredo M.A.T. (2003). Adaptive sparseness for supervised learning, IEEE Transactions on Pattern Analysis and Machine Intelligence, 25 (9) 1150-1159. DOI: http://dx.doi.org/10.1109/tpami.2003.1227989
Better Shrinkage Priors:
Armagan A., Dunson D.B. & Lee J. GENERALIZED DOUBLE PARETO SHRINKAGE., Statistica Sinica, PMID: 24478567

The post EM Algorithm for Bayesian Lasso R Cpp Code appeared first on Lindons Log.

To leave a comment for the author, please follow the link and comment on their blog: Lindons Log » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)