Geocomputation with R: workshop at eRum

(This article was first published on r-spatial, and kindly contributed to R-bloggers)

This is a guest post by Robin Lovelace, Jakub Nowosad and Jannes
Muenchow. Together we’re writing an open source book called
Geocomputation with R. The project aims to introducing people to R’s
rapidly evolving geographic data capabilities and provide a foundation
for developing scripts, functions and applications for geographic data
science.

We recently presented some contents of the in-progress book at the eRum
conference, where Jannes ran a
workshop on the topic. In this
article we share teaching materials from eRum for the benefit of those
who couldn’t be there in person and provide a ‘heads-up’ to the
R-Spatial community about plans for the book. We’ll start with an
overview of ‘geocomputation’ (and define what we mean by the term) and
finish by describing how R can be used as a bridge to access dedicated
GIS software.

Gecomp ‘base’ ics

The first thing many people will be asking is “what is geocomputation
anyway”? As Jannes mentioned in his talk, the choice of name was
influenced by the fact that the term seems to have originated in Leeds,
where one of the authors (Robin) is based. The first conference on the
subject was in Leeds in 1996, with associated extended abstracts
including old-school computer graphics still available
here; and there was a 21 year
home-coming anniversary in
2017
where Robin and Jakub presented. A more practical reason is that the
term is unambiguous: it’s about using computing techniques to do new
things with geographic data, as indicated in Section
1.1
of the book. Our approach differs in one way from the early conception
of geocomputation, however:

Unlike early works in the field all the work presented in this book is
reproducible using code and example data supplied alongside the book
[using R, an great language for reproducible research].

Like many open source projects R is evolving. Although ‘base R’ is
conservative (as demonstrated in Roger Bivand’s keynote, in which he did
a live demo using a version of R from 1997 that still runs!), the
‘ecosystem’ of packages that extend its capabilities changes fast (video
here, slides at
rsbivand/eRum18).

To clarify what we mean by ‘base R’, we can identify base packages with
the following code (source:
stackoverflow):

x = installed.packages()
row.names(x)[!is.na(x[ ,"Priority"])]

##  [1] "boot"       "foreign"    "Matrix"     "mgcv"       "base"      
##  [6] "boot"       "class"      "cluster"    "codetools"  "compiler"  
## [11] "datasets"   "foreign"    "graphics"   "grDevices"  "grid"      
## [16] "KernSmooth" "lattice"    "MASS"       "Matrix"     "methods"   
## [21] "mgcv"       "nlme"       "nnet"       "parallel"   "rpart"     
## [26] "spatial"    "splines"    "stats"      "stats4"     "survival"  
## [31] "tcltk"      "tools"      "utils"

The output shows there are 28 packages that are currently part of the
base distribution (R Core makes “base R” as Martin Maechler put
it

during another
keynote).
These can be relied on to change very little in terms of their API,
although bug fixes and performance improvements happen continuously.

The same cannot be said of contributed packages. Packages are created,
die (or are
abandoned)
and change, sometimes
dramatically. And this
applies as much (or more) to r-spatial as to any other part of R’s
ecosystem, as can be seen by looking at any one of R’s task
views
. At the time of writing
the Spatial task
view
alone listed
177 packages, many of them recently contributed and in-development.

In this context it helps to have an understanding of the history
(Bivand, Pebesma, and Gómez-Rubio 2013).
Like in politics, knowing the past can help navigate the future. More
specifically, knowing which packages are mature or up-and-coming can
help decide which one to use!

For this reason, after a
slide
on set-up (which is described in detail in chapter
2
of the book), the workshop spent a decent amount of time talking about
the history of spatial data in R, as illustrated in slide
20. A
more detailed account of the history of R-spatial is provided in section
1.5
of the book.


The slides outlining the basics of Geocomputation with R (which is
roughly a synonym for r-spatial) can be found here:
https://geocompr.github.io/workshops/erum2018/01_basics.html

Vector data

Spatial vector data are best used for objects that represent discrete
borders such as bus stops (points), streets (lines) and houses
(polygons). For instance, we can represent ‘Budapest’ (the city where
eRum 2018 was held) as a spatial point with the help of the sf package
(Pebesma 2018) as follows:

budapest_df = data.frame(
  name = "Budapest",
  x = 19.0,
  y = 47.5
  )
class(budapest_df)

## [1] "data.frame"

budapest_sf = sf::st_as_sf(budapest_df, coords = c("x", "y"))
class(budapest_sf)

## [1] "sf"         "data.frame"

Why bother creating a new class if both objects contain the same
essential data? It’s what you can do with an object that’s important.
The reason for using the sf class can be understood as follows: it
gives the budapest_sf spatial superpowers. We can, for example, now
identify what country the point is using a spatial function such as a
spatial
join

implemented in the function st_join() (spatial subsetting would also
do the trick, as covered in section
4.2.1).
First, we need to load the ‘world’ dataset and set the ‘CRS’ of the
object:

# set-up:
library(spData)
sf::st_crs(budapest_sf) = 4326

# spatial join:
sf::st_join(budapest_sf, world)

## Simple feature collection with 1 feature and 11 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: 19 ymin: 47.5 xmax: 19 ymax: 47.5
## epsg (SRID):    4326
## proj4string:    +proj=longlat +datum=WGS84 +no_defs
##       name iso_a2 name_long continent region_un      subregion
## 1 Budapest     HU   Hungary    Europe    Europe Eastern Europe
##                type area_km2     pop  lifeExp gdpPercap        geometry
## 1 Sovereign country 92476.46 9866468 75.87317   24016.3 POINT (19 47.5)

The slides describing vector data in R can be found
here:

Raster data
===========

On the other hand, a raster data represents continuous surfaces in form
of a regular grid. You can think about a raster as a matrix object
containing information about its spatial location. It has rows and
columns, each cell has a value (it could be NA) and its spatial
properties are described by the cell resolution (res), outer borders
(bounding box – xmn, xmx, ymn, ymx), and coordinate reference
system (crs). In R the raster package supports the spatial raster
format (Hijmans 2017).

library(raster)
elev = raster(nrow = 6, ncol = 6, 
              vals = 1:36,
              res = 0.5,
              xmn = -1.5, xmx = 1.5,
              ymn = -1.5, ymx = 1.5,
              crs = "+proj=longlat")

The data structure makes raster processing much more efficient and
faster than vector data processing.

elev2 = elev^2
elev3 = elev / elev2
elev4 = (elev2 - elev3) * log(elev)

elev_stack = stack(elev, elev2, elev3, elev4)
plot(elev_stack)

Raster objects can be subsetted (by index, coordinates, or other
spatial
objects)
,
transformed using
local,
focal,
zonal
and
global
operations, and
summarized.
Importantly, there are many tools allowing for interactions between
raster and vector data models, and transformation between
them
.

The slides associated with the raster data part of the workshop can be
found
here:

Visualizing spatial data
========================

The spatial powers mentioned previously have numerous advantages. One of
the most attractive is that geographic data in an appropriate class can
be visualized on a map, the topic of Chapter
9
of the book.

The workshop was an opportunity to expand on the contents of that
chapter and ask: what’s the purpose of maps in the first place? To
answer that question we used an early data visualization / infographic
created by Alexander von Humboldt, illustrated below. The point of this
is that it’s not always the accuracy of a map that’s most important
(although that is important): the meaning that you wish to convey and
the target audience should be central to the design of the map (in
Humboldt’s case the unity of nature to an audience of Enlightenment book
readers!):

In the context of geographic data in R, it is easier than ever to create
attractive maps to tell a story. The previously created point
representing Budapest, for example, can be visualized using the tmap
package as follows:

library(tmap)
budapest_df = data.frame(name = "Budapest", x = 19, y = 47.5)
class(budapest_df)
#> [1] "data.frame"
budapest_sf = sf::st_as_sf(budapest_df, coords = c("x", "y"))
class(budapest_sf)
#> [1] "sf"         "data.frame"
tmap_mode("view")
#> tmap mode set to interactive viewing
m = tm_shape(budapest_sf) + tm_dots() + tm_view(basemaps = "OpenStreetMap", 
  set.view = 9)
tmap_leaflet(m)

A range of mapping techniques were covered in the workshop including the
plot() method from the sf package that generates multiple maps from
a single object by default, such as this one representing the nz
(short for New Zealand) object from the spData package:

More advanced maps were demonstrated, including this animated map of the
United States (for information on how to make animated maps with R, see
section
9.3) of
Geocomputation with R.

The slides forming the basis of the visualization part of the tutorial
can be found
here:

Last but not least was a section on GIS bridges

Defining a Geographic Information System as a system for the
analysis, manipulation and visualization of geographic data (Longley et
al. 2015), we can safely claim that R
already has become a GIS. However, R has also its shortcomings when it
comes to spatial data analysis. To name but a few, R is not particularly
good at handling large geographic data, it is not a geodatabase and it
is missing literally hundreds of geoalgorithms readily available in GIS
software packages and spatial libraries. Fortunately, R has been
designed from the beginning as an interactive interface to other
languages and software packages (Chambers
2016). Hence, as long as we can access
the functionality of GIS software from within R, we can easily overcome
R’s spatial data analysis shortcomings. For instance, when attaching,
the sf package to the global environment, it automatically links to
GEOS, GDAL and
proj.4, this means, the sf package gives the R
user automatically access to the functionality of these spatial
libraries. Equally, there are a number of packages that provides access
to the geoalgorithms of major open source GIS Desktop software packages:

  1. rgrass7 provides access to
    GRASS7
  2. RSAGA provides access to SAGA
    GIS
  3. RQGIS provides access to QGIS. For
    much more details and background information, please check out the
    corresponding R Journal
    publication
    .

Note that you must have installed the GIS software on your machine
before you can use it through R.[1]

In the workshop we shortly presented how to use RQGIS. The
corresponding slides can be found
here:

In the book we additionally demonstrate how to use RSAGA and
rgrass7 in Chapter
10.

Background on the book

Geocomputation with R is a collaborative project. We joined forces
because each of us has been been teaching and contributing to R’s
spatial ecosystem for years and we all had a similar vision of a book to
disseminate R’s impressive geographic capabilities more widely.

As described in a previous
article
by Jakub, we’re making good progress towards finishing the book by the
end of summer 2018, meaning Geocomputation with R will be published
before the end of the year. The target audience is broad but we think it
will be especially useful to post and under-graduate students, R users
wanting to work with spatial data, and GIS users wanting to get to grips
with command-line statistical modeling software. A reason for publishing
the article here is that we have around 3 months (until the end of
August) to gather as much feedback on the book as possible before it’s
published. We plan to keep the code and prose up-to-date after that but
now is the ideal time to get involved. We welcome comments and
suggestions on the issue
tracker
, especially
from people with experience in the R-Spatial world in relation to:

  • Bugs: issues with lines of prose or code that could be improved.
  • Future-proofing: will the code and advice given stand the test of
    time? If you think some parts will go out of date quick, please let
    us know!
  • Anything else: ideas for other topics to cover, for example.

We would like to thanks the anonymous peer reviewers who have provided
feedback so far. We’re still working on changes to respond to their
excellent comments. If you’re interested in getting involved in this
project, please see the project’s GitHub repo at
github.com/Robinlovelace/geocompr
and check-out the in-progress chapters at
geocompr.robinlovelace.net.

References

Bivand, Roger S., Edzer Pebesma, and Virgilio Gómez-Rubio. 2013.
Applied Spatial Data Analysis with R. 2nd ed. 2013 edition. New York:
Springer.

Chambers, John M. 2016. Extending R. CRC Press.

Hijmans, Robert J. 2017. Raster: Geographic Data Analysis and
Modeling
. https://CRAN.R-project.org/package=raster.

Longley, Paul, Michael Goodchild, David Maguire, and David Rhind. 2015.
Geographic Information Science & Systems. Fourth edition. Hoboken, NJ:
Wiley.

Pebesma, Edzer. 2018. “Simple Features for R: Standardized Support for
Spatial Vector Data.” The R Journal.
https://journal.r-project.org/archive/2018/RJ-2018-009/index.html.

[1] Note also that RPyGeo provides access to
ArcMap which is a commercial
Desktop GIS software.

To leave a comment for the author, please follow the link and comment on their blog: r-spatial.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)