# Multi-species dynamic occupancy model with R and JAGS

February 24, 2013
By

(This article was first published on Ecology in silico, and kindly contributed to R-bloggers)

This post is intended to provide a simple example of how to construct
and make inferences on a multi-species multi-year occupancy model using
R, JAGS, and the ‘rjags’ package. This is not intended to be a
standalone tutorial on dynamic community occupancy modeling. Useful
primary literature references include MacKenzie et al. (2002), Kery and
Royle (2007), Royle and Kery (2007), Russell et al. (2009), and Dorazio
et al. (2010). Royle and Dorazio’s Heirarchichal Modeling and
Inference in Ecology
also provides a clear explanation of simple one
species occupancy models, multispecies occupancy models, and dynamic
(multiyear) occupancy models, among other things. There’s also a wealth
of code provided
here by
Elise Zipkin, J. Andrew Royle, and others.

Before getting started, we can define two convenience functions:

Then, initializing the number of sites, species, years, and repeat surveys (i.e. surveys within years, where the occupancy status of a site
is assumed to be constant),

we can begin to consider occupancy. We’re interested in making
inferences about the rates of colonization and population persistence
for each species in a community, while estimating and accounting for
imperfect detection.

Occupancy status at site $j$, by species $i$, in year $t$ is
represented by $z(j, i, t)$. For occupied sites $z=1$; for
unoccupied sites $z=0$. However, $z$ is incompletely observed: it
is possible that a species $i$ is present at a site $j$ in some year
$t$ ($z(j, i, t) = 1$) but species $i$ was never
seen at at site $j$ in year $t$ across all $k$ repeat
surveys because of imperfect detection. These observations
are represented by $x(j, i, t, k)$. Here we assume that there are no
“false positive” observations. In other words, if
$\sum_{1}^{k}x(j, i, t, k) >0\$, then $z(j, i, t) = 1$. If a site is
occupied, the probability that $x(j, i, t, k) = 1$ is represented as
a Bernoulli trial with probability of
detection $p(j, i, t, k)$, such that

The occupancy status $z$ of species $i$ at site $j$ in year $t$
is modeled as a Markov Bernoulli trial. In other words whether a species
is present at a site in year $t$ is influenced by whether it was
present at year $t-1$.

where for $t > 1$

and in year one $(t = 1)$

where the occupancy status in year 0,
$z_{0}(i, j) \sim Bernoulli(\rho_{0i})$, and
$\rho_{0i} \sim Uniform(0, 1)$. $\beta_i$ and $\rho_i$ are
parameters that control the probabilities of colonization and
persistence. If a site was unoccupied by species $i$ in a previous
year $z(i, j, t-1) = 0$, then the probability of colonization is
given by the antilogit of $\beta_i$. If a site was previously
occupied $z(i, j, t-1) = 1$, the probability of population
persistence is given by the anitlogit of $\beta_i + \rho_i$. We
assume that the distributions of species specific parameters are
defined by community level hyperparameters such that
$\beta_i \sim Normal(\mu_{\beta}, \sigma_\beta^2)$ and
$\rho_i \sim Normal(\mu_{\rho}, \sigma_\rho^2)$. We can generate
occupancy data as follows:

For simplicity, we’ll assume that there are no differences in species detectability among sites, years, or repeat surveys, but that detectability varies among species. We’ll again use hyperparameters to specify a distribution of detection probabilities in our community, such that $logit(p_i) \sim Normal(\mu_p, \sigma_p^2)$.

We can now generate our observations based on occupancy states and
detection probabilities. Although this could be vectorized for speed,
let’s stick with nested for loops in the interest of clarity.

Now that we’ve collected some data, we can specify our model:

Next, bundle up the data.

Provide initial values.

As a side note, it is helpful in JAGS to provide initial values for the incompletely observed occupancy state $z$ that are consistent with observed presences, as provided in this example with zinit. In other words if $x(j, i, t, k)=1$, provide an intial value of $1$ for $z(j, i, t)$. Unlike WinBUGS and OpenBUGS, if you do not do this, you’ll often (but not always) encounter an error message such as:

Now we’re ready to monitor and make inferences about some parameters of interest using JAGS.

At this point, you’ll want to run through the usual MCMC diagnostics to
check for convergence and adjust the burn-in or number of iterations
accordingly. Once satisfied, we can check to see how well our model
performed based on our known parameter values.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...