Occupancy models are used to understand species distributions while
accounting for imperfect detection. In this post, I’ll demonstrate a
method to evaluate the performance of occupancy models based on the
area under a receiver operating characteristic curve (AUC), as published
last year by Elise Zipkin and colleagues in
Ecological Applications.
Suppose we are to fit a multiyear occupancy model for one species. We
will evaluate the fit based on how well the model predicts occupancy in
the final year of the project. Start by simulating some data
(for details on the structure of these simulated data, refer to
this post and
references therein):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 

For illustration, I included a strong interaction between treatment
and the continuous site level covariate (could be elevation, area, etc). As such, a
measure of model fit such as AUC ought to identify a saturated model as
the best fitting. Handily, AUC is a derived parameter, and common
occupancy model parameters can be used to estimate a posterior.
To generate a posterior AUC, we need predicted occupancy probabilities
($\psi$) and realized occupancy states ($Z$) in the final year. Predicted occupancy
probabilities can be produced using data from previous years, and
realized occupancy states are assumed to be represented by the posterior
for $Z$ generated from a singleyear model, fit to the data from the final
year of the study.
Fitting a saturated model
Begin by modeling occupancy probabilities as a function of both covariates
and their interaction, predicting $\psi$ in the final year:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 

Now that we have our posteriors for $\psi$ at each site in the final
year, we can fit a singleyear model to the final year’s data to
estimate $Z$.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 

To set up the data for AUC calculations, produce site by
iteration arrays for $\psi$ and $Z$:
1 2 3 4 5 6 7 8 9 10 11 12 

Now generate the posterior for AUC and store data on the true and false
positive rates to produce ROC curves.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 

Fitting a simpler model
Having fitted a saturated model, we can now fit a simpler model that
includes only main effects:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 

Comparing models
How well did our models predict occupancy in the final year of the study,
and was one better than the other?
We can answer this question by inspecting posteriors for AUC (larger
values are better), and the ROC curves.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 

As expected, the model that generated the data fits better than the
model that excludes the strong interaction term. Note that AUC reflects
the accuracy of model predictions, and does not penalize model
complexity.
Rbloggers.com offers daily email updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...