Clipping polygons in R

Clipping polygons is a basic GIS task. It involves removing unneeded polygons outside of an area of interest. For example, you might want to study all local authorities (LADs) in the Yorkshire and the Humber region but can only obtain shapefiles that contain all the LADs in the UK. Removing the LADs outside of Yorkshire and the Humber could be achieved by ‘clipping’ the shapefile of LADs, using the extent of the larger region as a template. In R this takes a bit of work, but is quite possible with a few lines of code and has the advantage of being fully reproducible.

Getting started

Let’s start by obtaining administrative boundary shapefiles in England, loading necessary packages, and quickly checking the shapefiles are plotting correctly. We will need both regions and LADs. I’m using the freely-available boundaries from the UK Data Service:

# The default behaviour of this script is to create a folder called 'clip-example'
# and download and run everything from here.
# You can change this by modifying the next two lines
dir.create("clip-example")
setwd("clip-example")

# download and unzip UK administrative boundary shapefiles
download.file("https://census.edina.ac.uk/ukborders/easy_download/prebuilt/shape/England_gor_2011.zip",
              destfile = "gor.zip")  # ~6.4MB
download.file("https://census.edina.ac.uk/ukborders/easy_download/prebuilt/shape/England_lad_2011.zip",
              destfile = "lad.zip")  # ~25MB
unzip("gor.zip", exdir = ".") # '.' just means current directory
unzip("lad.zip", exdir = ".")

# I recommend loading the shapefiles with readOGR() in the rgdal package
# install.packages("rgdal") # uncomment and run if not already installed
# install.packages("rgeos") # as above

# If you run in to trouble installing rgdal and rgeos on Linux (Ubuntu) see:
# https://philmikejones.wordpress.com/2014/07/14/installing-rgdal-in-r-on-linux/

require("rgdal")
require("rgeos")

regions <- readOGR(".", "england_gor_2011")
plot(regions) # worth a quick check everything looks like it should!

lads <- readOGR(".", "england_lad_2011")
plot(lads)

So now we have everything we need to begin: a shapefile of English regions, and a shapefile of English LADs.

Extract Yorkshire and the Humber

You’ve hopefully seen from your test plot that the shapefiles are plotting all boundaries in England, so first let’s remove all but Yorkshire and the Humber from the regions shapefile. We can do this by looking for the names of the regions in the shapefiles attributes file and using standard subsetting syntax. If you’ve used other GIS software before hopefully this idea will be familiar to you, but if not, don’t worry!

regions <- regions[regions@data$name == "Yorkshire and The Humber", ]
# Another plot confirms just the Yorkshire and The Humber region is left
plot(regions)

Clip the LADs

The regions object now just contains Yorkshire and The Humber, so we can use this as our bounds to extract LADs:

yh_lads <- gIntersection(regions, lads, byid = TRUE, drop_lower_td = TRUE)
plot(yh_lads)

Easy, but we’re not finished yet. The clipped polygons no longer contain a data frame because the gIntersection doesn’t (and can’t) know which data frame items to save in to the new object. This means we must add them back in manually, but even this is relatively straight-forward.

Recreate the data frame

The row names for the clipped polygons are a concatenation of the regions row name and the LAD row names. These are in the form x yyyy where x is the region row name (in our case always 5, because Yorkshire and The Humber was row 5) and yyyy is the LAD row name. This can be confirmed with:

row.names(yh_lads)
## [1] "5 2"   "5 8"   "5 9"   "5 37"  "5 39"  "5 43"  "5 105" "5 124" "5 162" "5 165"
## [11] "5 175" "5 204" "5 207" "5 214" "5 225" "5 245" "5 248" "5 251" "5 254" "5 297"
## [21] "5 320"

So that we can create a data frame based on the LAD data frame, we must first ensure our row names match the LAD row names so they can be merged across. I do this by removing the “5 ” with gsub():

row.names(yh_lads) <- gsub("5 ", "", row.names(yh_lads))
# Confirm it worked
row.names(yh_lads)
##  [1] "2"   "8"   "9"   "37"  "39"  "43"  "105" "124" "162" "165" "175" "204" "207"
## [14] "214" "225" "245" "248" "251" "254" "297" "320"

So far, so good. Now we create a variable – “keep” – that contains the row names that we want to keep from the LADs data frame.

keep <- row.names(yh_lads)

For the sake of simplicity I also change the polygon IDs so they are consistent with their respective row names. This step is probably optional, but I prefer to ensure they match so I have a consistent set of IDs to work with later on. If you choose not to do this I don’t think you’ll run into any problems but YMMV.

yh_lads <- spChFIDs(yh_lads, keep)

Now we create a copy of the LAD dataframe and filter it so that it only contains the rows we want to keep, using row names to perform the match:

yh_data <- as.data.frame(lads@data[keep, ])

Finally we create a SpatialPolygonsDataFrame object with our clipped polygons and our subsetted data frame:

yh_lads <- SpatialPolygonsDataFrame(yh_lads, yh_data)

All being well, your final plot should look like this:

plot(yh_lads)
LADs in Yorks and the Humber
LADs in the Yorkshire and The Humber region

And your yh_lads object should contain attribute data, which can be viewed with:

yh_lads@data
##           CODE                        NAME ALTNAME
##  2   E07000164                   Hambleton    
##  8   E07000169                       Selby    
##  9   E08000017                   Doncaster    
##  37  E07000168                 Scarborough    
##  39  E08000032                    Bradford    
##  43  E07000165                   Harrogate    
##  105 E07000167                     Ryedale    
##  124 E07000166               Richmondshire    
##  162 E06000012     North East Lincolnshire    
##  165 E08000035                       Leeds    
##  175 E07000163                      Craven    
##  204 E08000033                  Calderdale    
##  207 E06000013          North Lincolnshire    
##  214 E08000019                   Sheffield    
##  225 E06000010 Kingston upon Hull, City of    
##  245 E06000011    East Riding of Yorkshire    
##  248 E06000014                        York    
##  251 E08000036                   Wakefield    
##  254 E08000034                    Kirklees    
##  297 E08000016                    Barnsley    
##  320 E08000018                   Rotherham    

# If further confirmation is required:
class(yh_lads)
##  [1] "SpatialPolygonsDataFrame"
##  attr(,"package")
##  [1] "sp"

Congratulations, a fully reproducible polygon clip!

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)