Stochastic search variable selection in JAGS

March 22, 2014
By

(This article was first published on Ecology in silico, and kindly contributed to R-bloggers)

Stochastic search variable selection (SSVS) identifies promising subsets of multiple regression covariates via Gibbs sampling (George and McCulloch 1993). Here’s a short SSVS demo with JAGS and R.

Assume we have a multiple regression problem:

We suspect only a subset of the elements of $\boldsymbol{\beta}$ are non-zero, i.e. some of the covariates have no effect.

Assume $\boldsymbol{\beta}$ arises from one of two normal mixture components, depending on a latent variable $\gamma_i$:

$\tau_i$ is positive but small s.t. $\beta_i$ is close to zero when $\gamma_i = 0$. $c_i$ is large enough to allow reasonable deviations from zero when $\gamma_i = 1$. The prior probability that covariate $i$ has a nonzero effect is $Pr(\gamma_i = 1) = p_i$. Important subtleties about priors are covered in George and McCulloch (1993) and elsewhere.

Let’s simulate a dataset in which some covariates have strong effects on the linear predictor, and other don’t: suppose we have 20 candidate variables, but only 60 observations.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
ncov <- 20
nobs <- 60
var_beta <- .004
c <- 1000
p_inclusion <- .5
sigma_y <- 1

# generate covariates
X <- array(dim=c(nobs, ncov))
for (i in 1:ncov){
  X[, i] <- rnorm(nobs, 0, 1)
}

included <- rbinom(ncov, 1, p_inclusion)
coefs <- rnorm(n=ncov,
               mean=0,
               sd=ifelse(included==1,
                         sqrt(var_beta * c),
                         sqrt(var_beta)
                         )
               )
coefs <- sort(coefs)
Y <- rnorm(nobs, mean=X %*% coefs, sd=sigma_y)

Specifying the model:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
cat("model{
  # typical regression priors
  alpha ~ dnorm(0, .001)
  sd_y ~ dunif(0, 100)
  tau_y <- pow(sd_y, -2)

  # ssvs priors
  sd_bet ~ dunif(0, 100)
  tau_in <- pow(sd_bet, -2)
  tau[1] <- tau_in            # coef effectively zero
  tau[2] <- tau_in / 1000     # nonzero coef
  p_ind[1] <- 1/2
  p_ind[2] <- 1 - p_ind[1]

  for (j in 1:ncov){
    indA[j] ~ dcat(p_ind[]) # returns 1 or 2
    gamma[j] <- indA[j] - 1   # returns 0 or 1
    beta[j] ~ dnorm(0, tau[indA[j]])
  }

  # likelihood
  for (i in 1:nobs){
    Y[i] ~ dnorm(alpha + X[i ,] %*% beta[], tau_y)
  }
}
    "
    , file="ssvs.txt")

Fitting the model and assessing performance against known values:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
require(gridExtra)
require(runjags)
require(ggmcmc)
library(coda)
dat <- list(Y=Y, X=X, nobs=nobs, ncov=ncov)
vars <- c("alpha", "sd_bet", "gamma", "beta", "tau_in", "sd_y")
out2 <- run.jags("ssvs.txt", vars, data=dat, n.chains=3,
                 method="parallel", adapt=5000, burnin=5000)
outdf <- ggs(as.mcmc.list(out2))

# extract E(gamma_i) for posterior inclusion probabilities
probs <- summary(out2)$statistics[((2 + ncov):(1+2*ncov)), 1]

labels <- rep(NA, ncov)
for (i in 1:ncov){
  labels[i] <- paste("beta[", i, "]", sep="")
}
xdf <- data.frame(Parameter = labels, value = 1:ncov)
p1 <- ggs_caterpillar(outdf, "beta", X=xdf) +
  theme_classic() +
  geom_vline(xintercept = 0, linetype = "longdash") +
  geom_point(data=data.frame(coefs, pos = 1:ncov),
              aes(x=coefs, y=pos), size=5, col="green4", alpha=.7) +
  xlab("Value") +
  ylab("Coefficient") +
  geom_hline(yintercept = 1:ncov, alpha=.05) +
  scale_y_continuous(breaks=seq(0, ncov, 1))

df <- data.frame(probs=probs, coefs = coefs)
p2 <- ggplot(df, aes(x=abs(coefs), y=probs)) +
  geom_point(size=5, alpha=.7) +
  theme_classic() +
  xlab("Absolute value of true coefficient") +
  ylab("Posterior probability of non-zeroness")

grid.arrange(p1, p2, ncol=2)

On the left, green points indicate true coefficient values, with black posterior Bayesian credible intervals. The right plot shows the relationship between the true magnitude of the effect and the posterior probability that the coefficient was non-zero, $E(\gamma_i \mid \boldsymbol{Y})$.

To leave a comment for the author, please follow the link and comment on his blog: Ecology in silico.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.