# Spatial predictions with GAMs and rasters

**Bluecology blog**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

# Spatial predictions with GAMs and rasters

One powerful use of GAMs is for interpolating to unsampled locations. We

can combine GAMs with raster package to conveniently predict a GAM model

to places we have not got data.

## Simulate some spatial data

We’ll simulate some spatial data based on rasters. There are two spatial

covariates, x1 and x2. We use these to simulate ‘true’ values of the

response, based on linear and polynomial relationships with x2 and x1

respectively.

library(raster) library(dplyr) library(ggplot2) library(patchwork) rbase <- raster(extent(c(0,1,0,1)), nrow = 50, ncol = 50) rx1 <- rx2 <- rbase rx1[] <- xFromCell(rbase, 1:ncell(rbase)) rx2[] <- yFromCell(rbase, 1:ncell(rbase)) rtrue <- 6*rx1 - 7*rx1^2- 4*rx2 par(mfrow = c(2,2)) plot(rx1, col = RColorBrewer::brewer.pal(11, "RdBu"), main = "x1") plot(rx2, col = RColorBrewer::brewer.pal(11, "RdBu"), main = "x2") plot(rtrue, col = RColorBrewer::brewer.pal(11, "RdBu"), main = "True values")

## Extract data at study sites

Here we create some random study site locations, then extract the values

of the covariates from the raster at the study site locations.

set.seed(42) site_means <- data.frame(x = runif(50), y = runif(50)) %>% mutate(site = 1:50, x1 = extract(rx1, cbind(x,y)), x2 = extract(rx2, cbind(x,y)), eta = extract(rtrue, cbind(x,y)), z = rnorm(50, sd = 1), yhat = eta + z)

Now we will assume 3 transects at each site with a a group random effect

(so transects at each site share a common deviation from the ‘true

effect’). There is also individual transect level variation.

dat <- inner_join(site_means, expand.grid(site = 1:50, transect = 1:3)) %>% mutate(b = rnorm(150, mean = yhat, sd = 0.5)) ## Joining, by = "site" g1 <- ggplot(dat) + aes(x = eta, y = b) + geom_point() g2 <- ggplot(dat) + aes(x = yhat, y = b) + geom_point() g3 <- ggplot(site_means) + aes(x = x, y = y, color = yhat) + geom_point() g3 | g1 /g2

The plot shows site means (colours) for location, then the relationship

between the ‘true’ values (eta) and the response (‘b’) and the site

means (yhat) and the response.

## Fit a GAM

Now we fit a GAM, with smoothers on covariates x1 and x2 and a random

effect (`bs = "re")`

for the sites.

library(visreg) library(mgcv) dat$sitef <- factor(dat$site) m1 <- gam(b ~ s(x1) + s(x2) + s(sitef, bs = 're'), data = dat) visreg(m1)

The GAM detects some non-linearity with x1 and a linear relationship

with x2.

Check the summary:

summary(m1) ## ## Family: gaussian ## Link function: identity ## ## Formula: ## b ~ s(x1) + s(x2) + s(sitef, bs = "re") ## ## Parametric coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -1.2716 0.1322 -9.619 4.83e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Approximate significance of smooth terms: ## edf Ref.df F p-value ## s(x1) 3.424 3.494 7.982 5.76e-05 *** ## s(x2) 1.000 1.000 49.915 1.01e-10 *** ## s(sitef) 40.718 47.000 9.500 < 2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## R-sq.(adj) = 0.901 Deviance explained = 93.1% ## GCV = 0.32768 Scale est. = 0.22688 n = 150

The edf on s(x2) is exactly 1, confirming the GAM has just fit a linear

relationship (correctly) for that covariate. The edf on s(x1) is 3.4,

indicating it has probably overfit that covariate a little (for a 2nd

order polynomial, we’d expect an edf of 2).

Don’t get too excited about the high deviance explained, because that

calculation includes the deviance explained by the random effect

smoother.

We can see how much variation ended up in the random effect with

gam.vcomp (which counter to its name gives variance components as

standard deviations):

gam.vcomp(m1) ## s(x1) s(x2) s(sitef) ## 1.335173e+01 1.685585e-04 8.934146e-01

So the SD for sites was about 0.89, which is close to the true value of

1.

## Predict everywhere

We can use our GAM to predict across the entire raster. We need to stack

the rasters together and name each layer of the stack so that it

corresponds to the covariate names in the call to `gam()`

.

One trick here is that we want to predict to the ‘average’ site, not to

individual sites, so we need to take out the random site effect. We can

do this by creating a raster of zeros and calling it ‘sitef’. Then the

predictions will return values for the average site (given of course x1

and x2 values).

rzeros <- rbase rzeros[] <- 0 rstack <- stack(rx1, rx2, rzeros) names(rstack) <- c("x1", "x2", "sitef") rpred <- predict(rstack, m1) ## Warning in predict.gam(model, blockvals, ...): factor levels 0 not in original ## fit

This returns a warning, but don’t worry, it was designed to work this

way!

We used a raster stack to form predictions, so the predictions will come

out as a raster, so we can just plot them as rasters:

par(mfrow = c(2,2)) plot(rpred, col = RColorBrewer::brewer.pal(11, "RdBu"), main = "Predictions") plot(rtrue, col = RColorBrewer::brewer.pal(11, "RdBu"), main = "True values") plot(rpred - rtrue, col = RColorBrewer::brewer.pal(11, "RdBu"), main = "Prediction - true")

Overall our model has captured the main spatial gradients, but is

tending to under-predict the region in the middle (in red) and southern

portion of the space and slightly over-predict in the top corners.

## Posterior predictions

This is just bonus content, because I find it interesting that you can

get Bayesian posterior distributions from a fitted GAM.

This example is straight out of `?predict.gam. These will be

conditional (not marginalized) on the site random effect.

newdat <- with(dat, data.frame(x1 = seq(min(x1), max(x1), by = 0.01), x2 = mean(x2), sitef = 0)) #again set sitef to zero # to ignore site deviations m1p <- predict(m1, newdata = newdat, type = "lpmatrix") ## Warning in predict.gam(m1, newdata = newdat, type = "lpmatrix"): factor levels 0 ## not in original fit

Now we have a predictions matrix, sample from it:

rmvn <- function(n,mu,sig) { ## MVN random deviates L <- mroot(sig);m <- ncol(L); t(mu + L%*%matrix(rnorm(m*n),m,n)) } br <- rmvn(1000,coef(m1),m1$Vp) m1post <- matrix(NA, nrow = nrow(newdat), ncol = 1000) for (i in 1:1000){ m1post[,i] <- m1p %*% br[i,] } CIdat <- t(apply(m1post, 1, quantile, probs = c(0.025, 0.5, 0.975))) %>% data.frame(., newdat) g1 <- ggplot(CIdat) + aes(x = x1, y = X50.) + geom_point() + geom_ribbon(aes(ymin = X2.5., ymax = X97.5.), color = NA, alpha = 0.5) + ylab("Predictions") + ggtitle("Predcitions with 95% C.I.") g2 <- ggplot(data.frame(b = m1post[1,])) + aes(x = b) + geom_density(fill = "Purple") + ggtitle("Posterior density at x = 0.01") g1 / g2

**leave a comment**for the author, please follow the link and comment on their blog:

**Bluecology blog**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.