# Spatial autocorrelation of errors in JAGS

February 10, 2014
By

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. In the core of kriging, Generalized-Least Squares (GLS) and geostatistics lies the multivariate normal (MVN) distribution – a generalization of normal distribution to two or more dimensions, with the option of having non-independent variances (i.e. autocorrelation). In this post I will show:

• (i) how to use exponential decay and the multivariate normal distribution to simulate spatially autocorrelated random surfaces (using the mvtnorm package)
• (ii) how to estimate (in JAGS) the parameters of the decay and the distribution, given that we have a raster-like surface structure.

Both procedures are computationally challenging as the total data size increases roughly above 2000 pixels (in the case of random data generation) or 200 pixels (in the case of the JAGS estimation).

These are the packages that I need:

  library(mvtnorm)   # to draw multivariate normal outcomes
library(raster)    # to plot stuff
library(rasterVis) # to plot fancy stuff
library(ggplot2)   # more fancy plots
library(ggmcmc)    # fancy MCMC diagnostics
library(R2jags)    # JAGS-R interface


Here are some auxiliary functions:

# function that makes distance matrix for a side*side 2D array
dist.matrix <- function(side)
{
row.coords <- rep(1:side, times=side)
col.coords <- rep(1:side, each=side)
row.col <- data.frame(row.coords, col.coords)
D <- dist(row.col, method="euclidean", diag=TRUE, upper=TRUE)
D <- as.matrix(D)
return(D)
}

# function that simulates the autocorrelated 2D array with a given side,
# and with exponential decay given by lambda
# (the mean mu is constant over the array, it equals to global.mu)
cor.surface <- function(side, global.mu, lambda)
{
D <- dist.matrix(side)
# scaling the distance matrix by the exponential decay
SIGMA <- exp(-lambda*D)
mu <- rep(global.mu, times=side*side)
# sampling from the multivariate normal distribution
M <- matrix(nrow=side, ncol=side)
M[] <- rmvnorm(1, mu, SIGMA)
return(M)
}

# function that converts a matrix to raster and scales its sides to max.ext
my.rast <- function(mat, max.ext)
{
rast <- raster(mat)
[email protected]@xmax <- max.ext
[email protected]@ymax <- max.ext
return(rast)
}


## The Model

I defined the model like this: The vector of data $y$ is drawn from a multivariate normal distribution

$y \sim MVN(\mu, \Sigma)$

where $\mu$ is a vector of $n$ pixel means and $\Sigma$ is the symmetrical $n \times n$ covariance matrix. The (Euclidean) distances between pixels (sites, locations) are stored in a symmetrical $n \times n$ distance matrix $D$ . The value of covariance between two locations is obtained by scaling the distance matrix by a negative exponential function:

$\Sigma_{ij} = exp(-\lambda D_{ij})$

where $i \in 1:n$ and $j \in 1:n$ . I chose the exponential decay as it well fits many empirical data, and it is simple, with just one parameter $\lambda$ .

## Simulating random normal surfaces with autocorrelated errors

First, I explored how tweaking of $\lambda$ affects the distance decay of covariance and the resulting spatial patterns. I examined $\lambda=0.01$ , $\lambda=0.1$ and $\lambda=1$ :

    Distance <- rep(seq(0,20, by=0.1), times=3)
Lambda <- rep(c(0.01, 0.1, 1), each=201)
Covariance <- exp(-Lambda*Distance)
xy <- data.frame(Distance, Covariance, Lambda=as.factor(Lambda))

ggplot(data=xy, aes(Distance, Covariance)) +
geom_line(aes(colour=Lambda)) Second, I simulated the surface for each of the $\lambda$ values. I also add one pattern with a completely uncorrelated white noise (all covariances are 0). I set the mean in each grid cell to 0.

  side=50     # side of the raster
global.mu=0 # mean of the process

M.01    <- cor.surface(side=side, lambda=0.01, global.mu=global.mu)
M.1     <- cor.surface(side=side, lambda=0.1, global.mu=global.mu)
M1      <- cor.surface(side=side, lambda=1, global.mu=global.mu)
M.white <-matrix(rnorm(side*side), nrow=side, ncol=side)

M.list <- list(my.rast(M.01, max.ext=side),
my.rast(M.1, max.ext=side),
my.rast(M1, max.ext=side),
my.rast(M.white, max.ext=side))
MM <- stack(M.list)
names(MM) <- c("Lambda_0.01", "Lambda_0.1", "Lambda_1", "White_noise")

levelplot(MM) # fancy plot from the rasterVis package ## Fitting the model and estimating $\lambda$ in JAGS

This is the little raster that I am going to use as the data:

# parameters (the truth) that I will want to recover by JAGS
side = 10
global.mu = 0
lambda = 0.3  # let's try something new

# simulating the main raster that I will analyze as data
M <- cor.surface(side = side, lambda = lambda, global.mu = global.mu)
levelplot(my.rast(M, max.ext = side), margin = FALSE) # preparing the data for JAGS
y <- as.vector(as.matrix(M))
my.data <- list(N = side * side, D = dist.matrix(side), y = y)


And here is the JAGS code. Note that in OpenBUGS you would use the spatial.exp distribution from GeoBUGS module, and your life would be much easier. Not available in JAGS, so here I have to do it manually:

cat("
model
{
# priors
lambda ~ dgamma(1, 0.1)
global.mu ~ dnorm(0, 0.01)
global.tau ~ dgamma(0.001, 0.001)
for(i in 1:N)
{
# vector of mvnorm means mu
mu[i] ~ dnorm(global.mu, global.tau)
}

# derived quantities
for(i in 1:N)
{
for(j in 1:N)
{
# turning the distance matrix to covariance matrix
D.covar[i,j] <- exp(-lambda*D[i,j])
}
}
# turning covariances into precisions (that's how I understand it)
D.tau[1:N,1:N] <- inverse(D.covar[1:N,1:N])

# likelihood
y[1:N] ~ dmnorm(mu[], D.tau[,])
}
", file="mvnormal.txt")


And let’s fit the model:

fit <-  jags(data=my.data,
parameters.to.save=c("lambda", "global.mu"),
model.file="mvnormal.txt",
n.iter=10000,
n.chains=3,
n.burnin=5000,
n.thin=5,
DIC=FALSE)

## module glm loaded

## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
##    Graph Size: 10216
##
## Initializing model


This is how the posteriors of $\lambda$ and global.mu look like:

ggs_traceplot(ggs(as.mcmc(fit))) ggs_density(ggs(as.mcmc(fit))) The results are not very satisfactory. It looks like $\lambda$ converges nicely around the true value of 0.3. The global.mu, which should converge aroun 0, is a totally wobbly. The whole thing seems to have the right direction, but for some reason it cannot get there fully.

I made this post during a flight from Brussels to Philadelphia on the 7th February 2014. There may be errors.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.