This is a guest post generously provided by Joe Mihaljevic.
A common goal of community ecology is to understand how and why species composition shifts across space. Common techniques to determine which environmental covariates might lead to such shifts typically rely on ordination of community data to reduce the amount of data. These techniques include redundancy analysis (RDA), canonical correspondence analysis (CCA), and nonmetric multidimensional scaling (NMDS), each paired with permutation tests. However, as brought to light by Jackson et al. (2012: Ecosphere), these ordination techniques do not discern specieslevel covariate effects, making it difficult to attribute communitylevel pattern shifts to specieslevel changes. Jackson et al. (2012) propose a hierarchical modeling framework as an alternative, which we extend in this post to correct for imperfect detection.
Multilevel models can estimate specieslevel random and fixed covariate effects to determine the relative contribution of environmental covariates to changing composition across space (Jackson et al. 2012). For presence/absence data, such models are often formulated as:
Here $y_q$ is a vector of presences/absence of each species at each site ($q=1, … , nm,$ where $n$ is the number of species and $m$ the number of sites). This model can be extended to incorporate multiple covariates.
We are interested in whether species respond differently to environmental gradients (e.g. elevation, temperature, precipitation). If this is the case, then we expect community composition changes along such gradients. Concretely, we are interested in whether $\sigma_{slope}^2$ for any covariate differs from zero.
Jackson et al. (2012) provide code for a maximum likelihood implementation of their model with data from Southern Appalachian understory herbs using the R package lme4. Here we present a simple extension of Jackson and colleague’s work, correcting for detection error with repeat surveys (i.e. multispecies occupancy modeling). Specifically, the above model could be changed slightly to:
Now $y_q$ is the number of times each species is observed at each site over $j$ surveys. $p_{spp[q]}$ represents the speciesspecific probability of detection when the species is present, and $z_q$ represents the ‘true’ occurence of the species, a Bernoulli random variable with probability, $\psi_q$.
To demonstrate the method, we simulate data for a 20 species community across 100 sites with 4 repeat surveys. We assume that three sitelevel environmental covariates were measured, two of which have variable affects on occurrence probabilities (i.e. random effects), and one of which has consistent effects for all species (i.e. a fixed effect). We also assumed that speciesspecific detection probabilities varied, but were independent of environmental covariates.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 

We fit the following model with JAGS with vague priors.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 

Using information theory, specifically WatanabeAkaike information criteria (WAIC), we compared this model, which assumes all covariates have random effects, to all combination of models varying whether each covariate has fixed or random effects. See this GitHub repository for all model statements and code.
A model that assumes all covariates have random effects, and the datagenerating model, in which only covariates 1 and 2 have random effects, performed the best, but were indistinguishable from one another:
This result makes sense because the model with all random effects is able to recover speciesspecific responses to sitelevel covariates very well:
However, this model estimates that the 95% HDI of $\sigma_{slope}$ of covariate 3 includes zero, indicating that this covariate effectively has a fixed, rather than random effect among species.
Thus, we could conclude that the first two covariates have random effects, while the third covariate has a fixed effect. This means that composition shifts along gradients of covariates 1 and 2. We can visualize the relative contribution of covariate 1 and 2’s random effects to composition using ordination, as discussed in Jackson et al. (2012). To do this, we compare the linear predictor (i.e. $logit^{1}(\psi_q)$) of the best model that includes only significant random effects to a model that does not have any random effects.
The code to extract linear predictors and ordinate the community is provided on GitHub:
Rbloggers.com offers daily email updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...