# Spatial clipping and aggregation in R

**Robin Lovelace - R**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

A common task in GIS is comparing the spatial extent of one layer with another.

Say you have a load of points, some of which overlay a polygon layer. You are only interested

in the points that *intersect* with the points. What do you do? Also, how can you aggregated-up

the values contained in the points to correspond with the polygons.

These are complex computational problems. In this post

we will see how recent updates to R's `sp`

package make

the solution surprisingly intuitive and incredibly terse.

# Loading the data

All of the data (and more) for this can be downloaded from the

tutorial page on GitHub.

To make this tutorial reproducible on any computer, we will download each input dataset

from within R using `download.file`

.

library(sp) download.file("http://robinlovelace.net/data/lnd.RData", destfile = "lnd.RData") download.file("http://robinlovelace.net/data/stations.RData", destfile = "stations.RData") load("lnd.RData") load("stations.RData") plot(stations[sample(1:nrow(stations), 500), ]) plot(lnd, add = T, col = "red")

# Spatial subsetting (clipping)

As the plot demonstrates, the stations are far more exentsive than polygons of

central London. We must therefore clip them. Doing this manually would take

much time – we'd have to interrogate the coordinates of each point to see

whether or not it falls within one of the polygon boundaries.

Fortunately with have the `over`

function from the `sp`

package to make this

operation more terse:

sel <- over(stations, lnd) stations <- stations[!is.na(sel[, 1]), ]

As if that weren't enough, the developers of `sp`

have integrated

spatial subsetting into R's main index system with the square brackets.

Because this is a common procedure it is actually possible

to perform it with a single line of code:

stations <- stations[lnd, ] plot(stations)

As the figure shows, only stations within the London borroughs are now shown.

All that was needed was to place another spatial object in the row

index of the points (`[lnd, ]`

) and R automatically understood that a

subset based on location should be produced. This line of code is an example

of R's 'terseness' - only a single line of code is needed to perform what

is in fact quite a complex operation.

The *third* way to acheive the

same result uses the `rgeos`

package.

This is more complex and not included in this tutorial

(interested readers can see a vignette of this, to accomany the tutorial

on RPubs.com/Robinlovelace).

The next section demonstrates

spatial aggregation, a more advanced version of spatial subsetting.

# Spatial aggregation

As with R's very terse code for spatial subsetting, the base function

`aggregate`

(which provides summaries of variables based on some grouping variable)

also behaves differently when the inputs are spatial objects.

stations.c <- aggregate(stations, lnd, length) [email protected][, 1] ## [1] 48 22 43 18 12 13 25 24 12 46 18 20 28 32 38 19 30 25 31 7 10 38 12 ## [24] 16 28 17 16 28 4 6 14 26 5

The above code performs a number of steps in just one line:

`aggregate`

identifies which`lnd`

polygon (borrough) each`station`

is located in and groups them accordingly- it counts the number of stations in each borrough
- a new spatial object is created and assigned the name
`stations.c`

, the count of stations

As shown below, the spatial implementation of `aggregate`

can provide summary statistics of variables.

In this case we take the variable `NUMBER`

and find its mean value for the stations in each ward.

stations.m <- aggregate(stations[c("NUMBER")], by = lnd, FUN = mean)

For an optional advanced task, let us analyse and plot the result.

q <- cut(stations.m$NUMBER, breaks = c(quantile(stations.m$NUMBER)), include.lowest = T) summary(q) ## [1.82e+04,1.94e+04] (1.94e+04,1.99e+04] (1.99e+04,2.05e+04] ## 9 8 8 ## (2.05e+04,2.1e+04] ## 8 clr <- as.character(factor(q, labels = paste0("grey", seq(20, 80, 20)))) plot(stations.m, col = clr) legend(legend = paste0("q", 1:4), fill = paste0("grey", seq(20, 80, 20)), "topright")

areas <- sapply([email protected], function(x) [email protected])

This results in a simple choropleth map and a new vector containing the area of each

borrough. As an additional step, try comparing the mean area of each borrough with the

mean value of stations within it: `plot(stations.m$NUMBER, areas)`

.

If you'd like to learn more about R's rapidly improving spatial functionality,

you can download the complete tutorial, in .pdf or .Rmd form, from

github.com/Robinlovelace/Creating-maps-in-R/.

**leave a comment**for the author, please follow the link and comment on their blog:

**Robin Lovelace - R**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.