# Part 3: Spatial analysis of geotagged data

February 21, 2017
By

(This article was first published on Bluecology blog, and kindly contributed to R-bloggers)

## Part 3: Spatial analysis of geotagged data

See the other parts in this series of blog
posts
.

In parts 1 and 2 we extracted spatial coordinates from our photos and
then made an interactive web map that included data associate with those
photos. Here I describe how we can build a spatial statistical model to
interpolate to unmeasured locations.

Here we will build an interactive map showing model results from an interpolation of oyster presence/absence.

First up we need to load some packages:

``````    library(readr)
library(INLA)
library(sp)
library(raster)
library(rgeos)
``````

The `INLA` package is crucial here. We will be
using `INLA` to perform Bayesian Kriging. In non-technical terms,
Bayesian kriging lets us fill in the gaps between observations of oyster
counts by interpolating from nearby points.

### Getting the data in the right form

Before we begin, we need to load in the data, and transform it into a
spatial data frame. In particular, it is important to transform the
lon-lats into UTM coordinates. UTM coordinates allow us to measure
distances between sample sites in metres:

``````    dat <- read_csv('Oysters_merged.csv')
n <- nrow(dat)
utmproj <- "+proj=utm +zone=10 +north +datum=WGS84 +units=m +no_defs +ellps=WGS84 +towgs84=0,0,0"
spdat <- dat
coordinates(spdat) <- ~ GPSLongitude + GPSLatitude
proj4string(spdat) <- "+init=epsg:4326"
spdf <- spTransform(spdat, CRS(utmproj))
plot(spdf)
`````` You should get a plot like that pictured just of the sample sites.

### Creating an INLA “Mesh”

Now we can start building the specialised data-structures we need for
the `INLA` package. First up, we have to build what’s called a mesh.
This is a Delaunay
triangulation

built around our data points. It accounts for the non-regular spatial
structuring of the sampling (I wandered around on the rocky shore

Below we provide two parameters to `max.edge`, this will enable us to
buffer the edges to obtain a slightly large spatial domain.

``````max.edge.length <- c(25, 40)
loc1 <- as.matrix(coordinates(spdf))
mesh <- inla.mesh.2d(loc=loc1, max.edge = max.edge.length, offset=1, cutoff = 5)

plot(mesh)
plot(spdf, add = T, col = 'red')
`````` You can play with the edge length, offset and cutoff parameters to vary
how the triangulations turns out. You can get more guidance at the
`INLA` page and in this
tutorial
.

An important step is to check that our maximum edge lengths are less
than the estimated range (otherwise it is pointless including a spatial
effect!). We will check the range size below, once we have fitted the
model.

You might like to think of a-prior reasons for modelling a certain
range, for instance, what is a reasonable spatial scale for ‘clumping’
of oyster patches?

### Build INLA stack

Now we need to implement a few more steps to build up an appropriate
‘stack’ of data for `INLA`. Note that I have transformed oyster count to
presence – absence. We will build a binomial model and just predict

``````A.data <- inla.spde.make.A(mesh, loc1)
spde <- inla.spde2.matern(mesh, alpha = 2)
spdf\$presabs <- ifelse(spdf\$oysters_live>0, 1, 0)

stk <- inla.stack(data=list(y=spdf\$presabs), A=list(A.data, 1),
effects=list(s=1:mesh\$n, intercept=rep(1, n)),
remove.unused = FALSE, tag = "est")
``````

Note that here we choose `alpha = 2`. `alpha` is a smoothness parameter
that must be 0 <= `alpha` <= 2, higher values will give smoother
spatial interpolation. Also note that we have to specify ‘data’ for the
intercept (just a vector of `1`).

The `tag = "est"` argument just gives a name tag to the fitted model. In
other applications you might want to join the estimation stack with a
prediction stack (just build two stacks, one where the response is `NA`
and then join them by applying `inla.stack` to both).

### Model

Now, after that considerable amount of prepartion, we get to fitting the
model. We use standard R syntax for the formula, with the addition
of the `f()` term to specify a function.

``````    formula <- y ~ 0 + intercept + f(s, model=spde)

mod <- inla(formula, data=inla.stack.data(stk),
control.predictor=list(A = inla.stack.A(stk)),
family = 'binomial')
``````

We can extract summary parameters using `summary`. Here we are
interested in the random effects, it is good practice to call
`inla.hyperpar` before we inspect random effects:

``````hyper <- inla.hyperpar(mod)
summary(hyper)
``````

### Checking the range parameter

We should also check that our triangle edge length is less than the
estimated spatial range. We can do that by making a call to the model
object, asking for the estimated parameters for the random field

``````rf <- inla.spde.result(inla = mod, name = "s",
spde = spde, do.transf = TRUE)

plot(rf\$marginals.range[], type = "l",
xlab = "Range (m)", ylab = "Density",
xlim = c(0, 1000))
abline(v = max.edge.length, col = 'red')
`````` The red line shows our edge length parameter, which is well less than
the model estimate for the spatial range.

### Model predictions

Now we have a fitted model, we can extracted predicted probabilities of
oyster presence and map them. First transform the predictions from the
logit scale back to probabilities:

``````  ypred <- exp(mod\$summary.random\$s\$mean)/(1 + exp(mod\$summary.random\$s\$mean))
``````

Then we can turn our predictions into a spatial points, that can be be
mapped onto a raster for plotting:

``````extent <- extent(spdf)
xdims <- 100; ydims <- 200
xlim <- c(extent, extent); ylim = c(extent, extent);
proj <- inla.mesh.projector(mesh, xlim = xlim, ylim = ylim, dims=c(xdims, ydims))
field.proj <- inla.mesh.project(proj, ypred)

datpred <- data.frame(x = rep(proj\$x, ydims), y = rep(proj\$y, each = xdims), pred = as.numeric(field.proj))
coordinates(datpred) <- ~x + y

proj4string(datpred) <- utmproj
dat3 <- spTransform(datpred, crs("+init=epsg:4326"))

r <- raster(extent(dat3), ncols = xdims, nrows= ydims, crs = crs(spdat))
icell <- cellFromXY(r, dat3)

r[icell] <- as.numeric(dat3\$pred)
``````

And finally the plot:

``````cols <- RColorBrewer::brewer.pal(9, "Purples")
plot(r, col = cols)
points(spdf)
`````` So it looks like oysters are more common in the southern portion of the
survey area. But note that our predictions extent a fair way from the
points. We might want to redo these with a more constrained spatial
region.

Finally, we could map this onto a satellite image to get a better view
of what is going on. I am just using the same `leaflet` code as
before:

``````library(leaflet)
brks <- seq(0,1, by = 0.1)
ncol <- length(brks)
oystercols <- RColorBrewer::brewer.pal(min(ncol, 9), 'Reds')
pal <- colorBin(oystercols, dat\$oysters_live, bins = brks)

x1 <- -124.5997
y1 <- 49.52848

leaflet() %>%

setView(lng = x1, lat = y1, zoom = 16) %>%
addRasterImage(r, opacity = 0.8, color = pal) %>%
addCircleMarkers(lng = dat\$GPSLongitude, lat = spdat\$GPSLatitude,
values = brks,
title = "Chance of live oysters  (Crassostrea gigas)",
opacity = 1)
``````

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...