By Jian Li – Consultant, Shanghai.
Geographical data is becoming increasingly important in understanding how we can predict and influence our impact on the world. Data on the location of individuals (be they migrating animals, humans stuck in traffic, or a plane as it travels around the world) is constantly being recorded, and with sadly topical relevance. Systems that are designed to make sense of all of this data are termed Geographic Information Systems (GIS). The data is that is collected is enormous, complex and reliant on dependent on statistical algorithms developed by an academic user-base. As you may have guessed, these are reasons that R is one of the most perfect tools for the job. CRAN, the R-language’s foremost code repository, has a dedicated set of tools for the analysis of the data1. At the time of writing there are more than 100 relevant packages, and an active forum dedicated to helping R users to make the most of these complex datasets2.
One of the best packages for showing the power of R as a GIS tool is ‘sp’3. This packages provides classes and methods for dealing with spatial data, so influential is it that the definitions of data formats have become a de-facto standard in many of the associated packages.
Let’s start with a simple example:
> spdata <- as.data.frame(meuse[, 1:8])
cadmium copper lead zinc elev dist om ffreq x y
1 11.7 85 299 1022 7.909 0.00135803 13.6 1 181072 333611
2 8.6 81 277 1141 6.983 0.01222430 14.0 1 181025 333558
3 6.5 68 199 640 7.800 0.10302900 13.0 1 181165 333537
4 2.6 81 116 257 7.655 0.19009400 8.0 1 181298 333484
5 2.8 48 117 269 7.480 0.27709000 8.7 1 181307 333330
6 3.0 61 137 281 7.791 0.36406700 7.8 1 181390 333260
The meuse data are a set of data that describe geographically-distributed data for the river running through Northern-France and the lowlands. The data is stored in a standard R data frame, but we can consider it as ‘spatial data’ because it includes latitude (y) and longitude (x) information. The standard spatial data object is ‘Spatial’. It can be constructed from the meuse data by defining the bounding box and a valid term to define the projection (in this case sp.crs):
> sp.crs <- CRS(“+init=epsg:28992″)
> sp.mat <- cbind(spdata$x, spdata$y)
> sp.bbox <- bbox(sp.mat)
> obj.Spatial <- Spatial(bbox = sp.bbox, proj4string = sp.crs)
An object of class “Spatial”
x 178605 181390
y 329714 333611
CRS arguments: +init=epsg:28992
PROJ.4 is an external library for performing conversions between cartographic projections. R uses the string to determine the right projection method so it can draw the map correctly.
‘Spatial’ is an abstract class and that simply defines a set of data, the subclasses that derive from it can be used in plotting and statistical processes. There are four-deriving data-types of spatial data: points, lines, polygons and grids. In this example we shall focus on just the points. Where the object only represents spatial information, we can generate a ‘SpatialPoints’ object:
> obj.sp <- SpatialPoints(sp.mat, proj4string = sp.crs)
If we need to include additional information (such as observations) in the object, we can use ‘SpatialPointsDataFrame’:
> obj.spdf <- SpatialPointsDataFrame(sp.mat, data = spdata, proj4string = sp.crs)
Of course, this is a useful class, so there is a quick way to create a ‘SpatialPointsDataFrame’ object from a data frame:
> obj.spdf2 <- spdata
> coordinates(obj.spdf2) <- ~x+y
> proj4string(obj.spdf2) <- sp.crs
Similarly, we can create SpatialLinesDataFrame, SpatialPolygonsDataFrame and SpatialGridDataFrame to represent lines, polygons and grids. We can use the associated ‘plot’ functions can represent the objects graphically, these can be included in many different graphical systems and elaborated in many different ways:
As is common amongst other GIS systems, R can share data by delivering data in a standard file format. One such example is the shapefile format that is developed and regulated by the Environmental Systems Research Institute (ESRI) as a mostly open specification for data interoperability. As an example the ‘maptools’ package provides functions to read and write shaplefile generated by the sp package:
> shp0 <- readShapePoly(system.file(“shapes/sids.shp”, package=”maptools”),
+ IDvar=”FIPSNO”, proj4string=CRS(“+proj=longlat +ellps=clrk66″))
> shp1 <- shp0[shp0$NAME == “Alamance”, ]
> writePolyShape(shp1, “test”)
In this example, ‘shp0’ is the ‘SpatialPolygonsDataFrame’ object we generated with sp. With the maptools package loaded we can use such an object as if it were a data.frame object. For this example, we can create ‘shp1’ by sub-setting it. Then we can use the ‘writePolyShape’ function to port this R object to a shapefile where it can use it with other GIS tools.