Spatial correlograms in R: a mini overview

[This article was first published on Are you cereal? » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Spatial correlograms are great to examine patterns of spatial autocorrelation in your data or model residuals. They show how correlated are pairs of spatial observations when you increase the distance (lag) between them – they are plots of some index of autocorrelation (Moran’s I or Geary’s c) against distance. Although correlograms are not as fundamental as variograms (a keystone concept of geostatistics), they are very useful as an exploratory and descriptive tool. For this purpose they actually provide richer information than variograms.

correlogr_spdep
Figure 1 | This is how a simple correlogram looks like (as provided by spdep package).

Here is a little overview of the available R packages and functions (I also list some related stuff that could be handy):

  • ade4 – This package has function gearymoran that calculates Moran’s I and Geary’s c. Does not plot correlograms.
  • ape – Moran’s I test (function Moran.I) for spatial and phylogenetic autocorrelation
    (based on normal approximation, not on randomizations = fast). Does not plot correlograms.
  • ncf – Provides functions correlog and spline.correlog. Plots correlograms. Does randomization tests.
  • pgirmess – Has function correlog that calculates the correlogram. It uses normal approximation to test significance.
  • raster – Simple function Moran. Works on rasters. You need to specify a simple neighborhood matrix. Does not plot correlograms.
  • spdepsp.correlogram, moran, moran.plot, moran.test, moran.mc. This is the most comprehensive package, and also the most difficult to work with. Does everything, has steep learning curve.
  • spatial – If I understand it correctly, this package first needs you to fit a trend surface (by kriging) and you can then calculate correlogram of this fitted surface. I haven’t gone deeper into it.
  • mpmcorrelogram – I include it as a curiosity. It calculates Multivariate Mantel Correlograms (Oden & Sokal 1986, Legendre & Legendre 2003).

Comparison of three methods

Next, I compare three packages that do empirical spatial correlograms using a single function. These are ncf, spdep and pgirmess. First, I generated an artificial and spatially autocorrelated data. I used Principal Coordinates analysis of Neighbourhood Matrix (PCNM) implemented in vegan package to generate the data.

# packages used for the data generation
library(raster)
library(colorRamps) # for some crispy colors
library(vegan) # will be used for PCNM

# empty matrix and spatial coordinates of its cells
side=30
my.mat <- matrix(NA, nrow=side, ncol=side)
x.coord <- rep(1:side, each=side)
y.coord <- rep(1:side, times=side)
xy <- data.frame(x.coord, y.coord)

# all paiwise euclidean distances between the cells
xy.dist <- dist(xy)

# PCNM axes of the dist. matrix (from 'vegan' package)
pcnm.axes <- pcnm(xy.dist)$vectors

# using 8th PCNM axis as my atificial z variable
z.value <- pcnm.axes[,8]*200 + rnorm(side*side, 0, 1)

# plotting the artificial spatial data
my.mat[] <- z.value
r <- raster(my.mat)
plot(r, axes=F, col=matlab.like(20))

sp_pattern
Figure 2 | Artificial spatial pattern used for comparison of the three packages.

And here is how the correlograms are done in each of the packages. So far I had the best experiences with correlog in ncf package. It is user-friendly and simple, you just need x- and y- coordinates of your data and a vector of z- values. You specify the increment of the distance classes, and you can test significance within each distance class by a randomization test. However, note that the randomizations can take some time, and with large data (megapixel rasters) it may be nearly impossible to calculate the correlogram.

library(ncf)
ncf.cor <- correlog(x.coord, y.coord, z.value,
                    increment=2, resamp=500)

Somewhat similar is function correlog in package pgirmess. Again, simple and user-friendly. The advantage is that it does not test the significance by a randomization test, but it uses a normal approximation. Hence, it is faster than ncf. The package actually relies on some functions from spdep, but it is much simpler than spdep. You just need to specify the x- and y- coordinates, the method and the number of distance classes:

library(pgirmess)
pgi.cor <- correlog(coords=xy, z=z.value, method="Moran", nbclass=21)

The third package is spdep - a real juggernaut. In comparison with the previous two packages, spdep offers great flexibility and more options. If you have complex neighborhood (connectivity) structures or some really weird data you may want to go for this. But there is a cost: spdep is not user friendly. If you are not familiar with its classes (e.g. the 'nb' neighborhood class) then you will probably spend hours going through the help (I did). Here is a solution that I came up with for my data:

# 'nb' - neighbourhood of each cell
r.nb <- dnearneigh(as.matrix(xy), d1=0.5, d2=1.5)
# 'nb' - an alternative way to specify the neighbourhood
# r.nb <- cell2nb(nrow=side, ncol=side, type="queen")
sp.cor <- sp.correlogram(r.nb, z.value, order=15,
                         method="I", randomisation=FALSE)

Unfortunately, I was unable to get Moran's I for longer lags than 15. It just complained that there is not enough observations at these distances. Well, the previous packages could cope with that. Also, note that I have set randomisations to FALSE. This means that spdep will use the normal approximation to calculate p-values, rather than the randomization test (to save your CPU).

And here is how it looks when I plot all of the correlograms in one plot:

correlograms
Figure 3 | Correlograms of the spatial pattern from Fig. 1 produced by ncf (function correlog), spdep (function sp.correlogram) and pgirmess (function correlog).

They are not completely identical. I guess that this is because the packages use different approaches to define the spatial lags, and also they may have different concepts of spatial connectivity. Personally, I am actually happy that they are similar at all. I was expecting higher magnitude of discrepancy.

There are alternatives to the three packages tested above (see the list above), but they will all require a bit of coding. Packages raster, ape and ade4 offer functions to calculate Moran's I. You just need to find a way how to run these over different spatial lags in order to produce the correlograms. I suspect that especially the Moran function in raster will run fast even over large datasets.

If you know about other packages, if you spot a crucial mistake here, or if you feel that I have forgotten to mention something really essential, please bombard me with comments; complaints. Thank you!

To leave a comment for the author, please follow the link and comment on their blog: Are you cereal? » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)