Stochastic search variable selection in JAGS

[This article was first published on Ecology in silico, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Stochastic search variable selection (SSVS) identifies promising subsets of multiple regression covariates via Gibbs sampling (George and McCulloch 1993). Here’s a short SSVS demo with JAGS and R.

Assume we have a multiple regression problem:

We suspect only a subset of the elements of $\boldsymbol{\beta}$ are non-zero, i.e. some of the covariates have no effect.

Assume $\boldsymbol{\beta}$ arises from one of two normal mixture components, depending on a latent variable $\gamma_i$:

$\tau_i$ is positive but small s.t. $\beta_i$ is close to zero when $\gamma_i = 0$. $c_i$ is large enough to allow reasonable deviations from zero when $\gamma_i = 1$. The prior probability that covariate $i$ has a nonzero effect is $Pr(\gamma_i = 1) = p_i$. Important subtleties about priors are covered in George and McCulloch (1993) and elsewhere.

Let’s simulate a dataset in which some covariates have strong effects on the linear predictor, and other don’t: suppose we have 20 candidate variables, but only 60 observations.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
<span class="line">ncov <span class="o"><-</span> <span class="m">20</span>
</span><span class="line">nobs <span class="o"><-</span> <span class="m">60</span>
</span><span class="line">var_beta <span class="o"><-</span> <span class="m">.004</span>
</span><span class="line">c <span class="o"><-</span> <span class="m">1000</span>
</span><span class="line">p_inclusion <span class="o"><-</span> <span class="m">.5</span>
</span><span class="line">sigma_y <span class="o"><-</span> <span class="m">1</span>
</span><span class="line">
</span><span class="line"><span class="c1"># generate covariates</span>
</span><span class="line">X <span class="o"><-</span> array<span class="p">(</span>dim<span class="o">=</span>c<span class="p">(</span>nobs<span class="p">,</span> ncov<span class="p">))</span>
</span><span class="line"><span class="kr">for</span> <span class="p">(</span>i <span class="kr">in</span> <span class="m">1</span><span class="o">:</span>ncov<span class="p">){</span>
</span><span class="line">  X<span class="p">[,</span> i<span class="p">]</span> <span class="o"><-</span> rnorm<span class="p">(</span>nobs<span class="p">,</span> <span class="m">0</span><span class="p">,</span> <span class="m">1</span><span class="p">)</span>
</span><span class="line"><span class="p">}</span>
</span><span class="line">
</span><span class="line">included <span class="o"><-</span> rbinom<span class="p">(</span>ncov<span class="p">,</span> <span class="m">1</span><span class="p">,</span> p_inclusion<span class="p">)</span>
</span><span class="line">coefs <span class="o"><-</span> rnorm<span class="p">(</span>n<span class="o">=</span>ncov<span class="p">,</span>
</span><span class="line">               mean<span class="o">=</span><span class="m">0</span><span class="p">,</span>
</span><span class="line">               sd<span class="o">=</span>ifelse<span class="p">(</span>included<span class="o">==</span><span class="m">1</span><span class="p">,</span>
</span><span class="line">                         sqrt<span class="p">(</span>var_beta <span class="o">*</span> c<span class="p">),</span>
</span><span class="line">                         sqrt<span class="p">(</span>var_beta<span class="p">)</span>
</span><span class="line">                         <span class="p">)</span>
</span><span class="line">               <span class="p">)</span>
</span><span class="line">coefs <span class="o"><-</span> sort<span class="p">(</span>coefs<span class="p">)</span>
</span><span class="line">Y <span class="o"><-</span> rnorm<span class="p">(</span>nobs<span class="p">,</span> mean<span class="o">=</span>X <span class="o">%*%</span> coefs<span class="p">,</span> sd<span class="o">=</span>sigma_y<span class="p">)</span>
</span>

Specifying the model:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
<span class="line">cat<span class="p">(</span><span class="s">"model{</span>
</span><span class="line"><span class="s">  # typical regression priors</span>
</span><span class="line"><span class="s">  alpha ~ dnorm(0, .001)</span>
</span><span class="line"><span class="s">  sd_y ~ dunif(0, 100)</span>
</span><span class="line"><span class="s">  tau_y <- pow(sd_y, -2)</span>
</span><span class="line">
</span><span class="line"><span class="s">  # ssvs priors</span>
</span><span class="line"><span class="s">  sd_bet ~ dunif(0, 100)</span>
</span><span class="line"><span class="s">  tau_in <- pow(sd_bet, -2)</span>
</span><span class="line"><span class="s">  tau[1] <- tau_in            # coef effectively zero</span>
</span><span class="line"><span class="s">  tau[2] <- tau_in / 1000     # nonzero coef</span>
</span><span class="line"><span class="s">  p_ind[1] <- 1/2</span>
</span><span class="line"><span class="s">  p_ind[2] <- 1 - p_ind[1]</span>
</span><span class="line">
</span><span class="line"><span class="s">  for (j in 1:ncov){</span>
</span><span class="line"><span class="s">    indA[j] ~ dcat(p_ind[]) # returns 1 or 2</span>
</span><span class="line"><span class="s">    gamma[j] <- indA[j] - 1   # returns 0 or 1</span>
</span><span class="line"><span class="s">    beta[j] ~ dnorm(0, tau[indA[j]])</span>
</span><span class="line"><span class="s">  }</span>
</span><span class="line">
</span><span class="line"><span class="s">  # likelihood</span>
</span><span class="line"><span class="s">  for (i in 1:nobs){</span>
</span><span class="line"><span class="s">    Y[i] ~ dnorm(alpha + X[i ,] %*% beta[], tau_y)</span>
</span><span class="line"><span class="s">  }</span>
</span><span class="line"><span class="s">}</span>
</span><span class="line"><span class="s">    "</span>
</span><span class="line">    <span class="p">,</span> file<span class="o">=</span><span class="s">"ssvs.txt"</span><span class="p">)</span>
</span>

Fitting the model and assessing performance against known values:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
<span class="line">require<span class="p">(</span>gridExtra<span class="p">)</span>
</span><span class="line">require<span class="p">(</span>runjags<span class="p">)</span>
</span><span class="line">require<span class="p">(</span>ggmcmc<span class="p">)</span>
</span><span class="line">library<span class="p">(</span>coda<span class="p">)</span>
</span><span class="line">dat <span class="o"><-</span> list<span class="p">(</span>Y<span class="o">=</span>Y<span class="p">,</span> X<span class="o">=</span>X<span class="p">,</span> nobs<span class="o">=</span>nobs<span class="p">,</span> ncov<span class="o">=</span>ncov<span class="p">)</span>
</span><span class="line">vars <span class="o"><-</span> c<span class="p">(</span><span class="s">"alpha"</span><span class="p">,</span> <span class="s">"sd_bet"</span><span class="p">,</span> <span class="s">"gamma"</span><span class="p">,</span> <span class="s">"beta"</span><span class="p">,</span> <span class="s">"tau_in"</span><span class="p">,</span> <span class="s">"sd_y"</span><span class="p">)</span>
</span><span class="line">out2 <span class="o"><-</span> run.jags<span class="p">(</span><span class="s">"ssvs.txt"</span><span class="p">,</span> vars<span class="p">,</span> data<span class="o">=</span>dat<span class="p">,</span> n.chains<span class="o">=</span><span class="m">3</span><span class="p">,</span>
</span><span class="line">                 method<span class="o">=</span><span class="s">"parallel"</span><span class="p">,</span> adapt<span class="o">=</span><span class="m">5000</span><span class="p">,</span> burnin<span class="o">=</span><span class="m">5000</span><span class="p">)</span>
</span><span class="line">outdf <span class="o"><-</span> ggs<span class="p">(</span>as.mcmc.list<span class="p">(</span>out2<span class="p">))</span>
</span><span class="line">
</span><span class="line"><span class="c1"># extract E(gamma_i) for posterior inclusion probabilities</span>
</span><span class="line">probs <span class="o"><-</span> summary<span class="p">(</span>out2<span class="p">)</span><span class="o">$</span>statistics<span class="p">[((</span><span class="m">2</span> <span class="o">+</span> ncov<span class="p">)</span><span class="o">:</span><span class="p">(</span><span class="m">1+2</span><span class="o">*</span>ncov<span class="p">)),</span> <span class="m">1</span><span class="p">]</span>
</span><span class="line">
</span><span class="line">labels <span class="o"><-</span> rep<span class="p">(</span><span class="kc">NA</span><span class="p">,</span> ncov<span class="p">)</span>
</span><span class="line"><span class="kr">for</span> <span class="p">(</span>i <span class="kr">in</span> <span class="m">1</span><span class="o">:</span>ncov<span class="p">){</span>
</span><span class="line">  labels<span class="p">[</span>i<span class="p">]</span> <span class="o"><-</span> paste<span class="p">(</span><span class="s">"beta["</span><span class="p">,</span> i<span class="p">,</span> <span class="s">"]"</span><span class="p">,</span> sep<span class="o">=</span><span class="s">""</span><span class="p">)</span>
</span><span class="line"><span class="p">}</span>
</span><span class="line">xdf <span class="o"><-</span> data.frame<span class="p">(</span>Parameter <span class="o">=</span> labels<span class="p">,</span> value <span class="o">=</span> <span class="m">1</span><span class="o">:</span>ncov<span class="p">)</span>
</span><span class="line">p1 <span class="o"><-</span> ggs_caterpillar<span class="p">(</span>outdf<span class="p">,</span> <span class="s">"beta"</span><span class="p">,</span> X<span class="o">=</span>xdf<span class="p">)</span> <span class="o">+</span>
</span><span class="line">  theme_classic<span class="p">()</span> <span class="o">+</span>
</span><span class="line">  geom_vline<span class="p">(</span>xintercept <span class="o">=</span> <span class="m">0</span><span class="p">,</span> linetype <span class="o">=</span> <span class="s">"longdash"</span><span class="p">)</span> <span class="o">+</span>
</span><span class="line">  geom_point<span class="p">(</span>data<span class="o">=</span>data.frame<span class="p">(</span>coefs<span class="p">,</span> pos <span class="o">=</span> <span class="m">1</span><span class="o">:</span>ncov<span class="p">),</span>
</span><span class="line">              aes<span class="p">(</span>x<span class="o">=</span>coefs<span class="p">,</span> y<span class="o">=</span>pos<span class="p">),</span> size<span class="o">=</span><span class="m">5</span><span class="p">,</span> col<span class="o">=</span><span class="s">"green4"</span><span class="p">,</span> alpha<span class="o">=</span><span class="m">.7</span><span class="p">)</span> <span class="o">+</span>
</span><span class="line">  xlab<span class="p">(</span><span class="s">"Value"</span><span class="p">)</span> <span class="o">+</span>
</span><span class="line">  ylab<span class="p">(</span><span class="s">"Coefficient"</span><span class="p">)</span> <span class="o">+</span>
</span><span class="line">  geom_hline<span class="p">(</span>yintercept <span class="o">=</span> <span class="m">1</span><span class="o">:</span>ncov<span class="p">,</span> alpha<span class="o">=</span><span class="m">.05</span><span class="p">)</span> <span class="o">+</span>
</span><span class="line">  scale_y_continuous<span class="p">(</span>breaks<span class="o">=</span>seq<span class="p">(</span><span class="m">0</span><span class="p">,</span> ncov<span class="p">,</span> <span class="m">1</span><span class="p">))</span>
</span><span class="line">
</span><span class="line">df <span class="o"><-</span> data.frame<span class="p">(</span>probs<span class="o">=</span>probs<span class="p">,</span> coefs <span class="o">=</span> coefs<span class="p">)</span>
</span><span class="line">p2 <span class="o"><-</span> ggplot<span class="p">(</span>df<span class="p">,</span> aes<span class="p">(</span>x<span class="o">=</span>abs<span class="p">(</span>coefs<span class="p">),</span> y<span class="o">=</span>probs<span class="p">))</span> <span class="o">+</span>
</span><span class="line">  geom_point<span class="p">(</span>size<span class="o">=</span><span class="m">5</span><span class="p">,</span> alpha<span class="o">=</span><span class="m">.7</span><span class="p">)</span> <span class="o">+</span>
</span><span class="line">  theme_classic<span class="p">()</span> <span class="o">+</span>
</span><span class="line">  xlab<span class="p">(</span><span class="s">"Absolute value of true coefficient"</span><span class="p">)</span> <span class="o">+</span>
</span><span class="line">  ylab<span class="p">(</span><span class="s">"Posterior probability of non-zeroness"</span><span class="p">)</span>
</span><span class="line">
</span><span class="line">grid.arrange<span class="p">(</span>p1<span class="p">,</span> p2<span class="p">,</span> ncol<span class="o">=</span><span class="m">2</span><span class="p">)</span>
</span>

On the left, green points indicate true coefficient values, with black posterior Bayesian credible intervals. The right plot shows the relationship between the true magnitude of the effect and the posterior probability that the coefficient was non-zero, $E(\gamma_i \mid \boldsymbol{Y})$.

To leave a comment for the author, please follow the link and comment on their blog: Ecology in silico.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)