Spatial visualization with R – Tutorial

December 10, 2014
By

(This article was first published on eoda, R und Datenanalyse » eoda english R news, and kindly contributed to R-bloggers)

The visualization of spatial data is one of the most popular applications when using R. This tutorial is an introduction to the visualization of spatial data and map making in R.

Spatial Visualisation with R

Spatial Visualisation with R

The following issues will be addressed:

    • Import anduse of shapefiles in R (shapefile is a file format designed for spatial data).
    • Data management and respectively restructuring of data
    • Map Making
    • Visualization by combining the spatial information of static maps from OpenStreetMap

As an example, a map of Kassel boroughs will be created, which illustrates the number of households by each zip code area. Such a map type is also known as Choropleth map.

In a further step, the stations of the Kassel bicycle sharing system “Konrad” will be added.

Import of shapefiles

The information with the map outlines are saved in a shapefile. Such a shapefile consists of at least three files:

  • A file with the extension “shp” which contains all geometric information.
  • A file with the extension “dbf” which contains the attributes of geometric figures.
  • In our example, it contains the zip codes and the associated city names.
  • An index file with the file extension “shx” – in order to distinguish geometric data according to any attributes. In our example, we use a dataset of all German postal code areas but require only those from Kassel city.

In R, a shapefile can be imported e.g. with the command readShapeSpatial by using package “maptools”.

We will receive an s4 object of the class Large SpatialPolygonsDataFrame. In comparison to lists, S4 classes provide a formal object structure and enable an object oriented workflow. These objects usually can be further indexed just like a data frame.


#loading shapefile####


shapes <- readShapeSpatial(“plz/post_pl.shp”)

#subsetting only Kassel shapes

ks_shapes <- shapes[shapes$PLZORT99 == “Kassel”,]

The command plot(ks_shapes) visualizes the outlines in R.

Darstellung von Kartenumrissen in R

Now wie Combine the map with further Information.

Data Management

The dataset is from datamarket.com and handed to the department of Statistics. The data is given for neighborhoods, while the map is based on the postal code. Therefore, another data set is necessary to assign the appropriate neighborhoods to the postal codes. Such an assignment can be found on Wikipedia, for example (http://de.wikipedia.org/wiki/Kategorie:Stadtteil_von_Kassel).

We read the data into R with read.csv2. They look like follow:

districts_ks <- read.csv2(“data/mapping_postalcodes_districts_kassel.csv”,

stringsAsFactors = FALSE)

districts_ks

districts kassel

districts kassel

districts_data <- read.csv2(“data/districts_information.csv”, stringsAsFactors = FALSE)

districts_data

District Households private.used.cars.per.1000.residents

districts data

districts data

To combine two different datasets with each other, it is necessary to unify the district variables. For example, in one data set we have got the description “Philippinenhof-Warteberg“, in the other one we have got the description „Philippinenhof/Warteberg“. In these small data sets, we change the names simply by indexing and assignment. (For larger datasets, a systematically data cleaning should be done with regular expressions, for example, to search for patterns in a given text.)

#correction of district names in districts_data

districts_data$District[districts_data$District == “Nord (Holland)”] <- “Nord-Holland”

districts_data$District[districts_data$District == “Philippinenhof/Warteberg”] <- “Philippinenhof-Warteberg”

districts_data$District[districts_data$District == “Süsterfeld/Helleböhn”] <- “Süsterfeld”

districts_data$District[districts_data$District == “Wolfsanger/Hasenhecke”] <- “Wolfsanger-Hasenhecke”

The two datasets are merged.

#merging datasets

districts_data <- merge(districts_data, districts_ks, all.x = TRUE)

The dataset looks like:

Auszug aus dem Datensatz über die Haushaltsverteilung in Kasseler Statdteilen

It causes duplicate entries for district and postal code. As a next step, we will summarize the data so we get unique entries for the variable postal code. We sum the respective values of Households, and write the corresponding Districts into a string. For aggregation, we use the data management package dplyr which has its own grammar of data management. An important component is the Pipe Operator %>%. Originally from the magrittr package, function calls can be chained together with %>%, without calling the dataset manually every time.
#aggregation by postalcode
data.agg %

group_by(Postalcode)%>%

summarise(District = paste(unique(District),collapse = ” & “) , Households = sum(Households),

cars = sum(private.used.cars.per.1000.residents))

The resulting dataset looks like:

Datensatzauszug

An inaccuracy caused by this method is that we do not know exactly how the population is distributed in the districts across the postal code areas. For example, how many of the 5.040 households in the district center accounts to the zip codes 34121 and on how many to 34117?

Thus, the data can be finally visualized on the map, it still needs to be divided into classes.
With the cut function, we create a factor that contains the appropriate classes. We assign names to the classes with the next command.

classes <- cut(data.agg$Households[position], c(0,6000,10000,15000,20000,25000))

levels(classes) <- c(“under 6000″, “6000 – 10000″, “10000 – 15000″, “15000 – 20000″,”20000 – 25000″)

We create a color vector that contains 5 different shades of green.
colours<- c(“#73AC97″, “#489074″, “#267356″, “#0E563B”, “#003924″)

Create Plot

To display the map, we use the plot function in base R. First, we set global parameters for the output device.
par(mar = c(0,0,1,9),oma = c(3,0,2,1),
bg = “lightgrey”,xpd = TRUE)

We define the inner margins of the plot with mar = c(0,0,1,9). We don’t want an inner margin below and left, above an inner of a row and on the right side, the legend should have with nine lines a slightly wider margin. With oma = c(3,0,2,1) we define the size of the outer edges. bg = „lightgrey“ specifies that our graphics gets a light gray background. By setting the Argument xpd = TRUE we ensure, that the legend is displayed within the inner frame.

To visualize the map, we can use the plot function, then we surround the postcode areas dark gray and pass our color vector as color, sorted by classes (Number of households in classes).

plot(ks_shapes,border = “darkgrey”, col = colours[classes])

The function text(coordinates(ks_shapes), labels =ks_shapes$PLZ99_N, col = “white”) adds labels to the areas in the form of postal codes.
A legend can be added with the function called legend() hinzugefügt werden.
legend(x = par(“usr”)[1],y = par(“usr”)[3], pch = 15, col = colours, legend = levels(classes),ncol = 1, bty = “n”, title = “Households”,
xjust = -3.4, yjust = -1.8)

Overview: Arguments of legend

Overview: Arguments of R-Function legend

With title(“Households in Kassel”), we can add a title of the plot

Kartenvisualisierung mit R - Haushaltsverteilung in Kassel

Integration of information from openstreetmap.de

The information about the coordinates of the stations of bike rental system Konrad come from a osm file which relates to the urban area of Kassel. The file itself is constructed in a xml structure.

To extract the Konrad coordinates from this file, there are two different ways:
osm parsen with the osmar package
The first way is to use the osmar package, which handles openstreetmap data. The commands osmsource_file and get_osm load the data and store the information into an osmar object.

src <- osmsource_file(“C:/Users/msc.EODA/workspace/geodaten_visualisierung/Kassel.osm”)

OBJ <- get_osm(complete_file(), source = src)

It should be noted that this is done for all stored information, not just for the desired Konrad stations. Accordingly, the process requires quite a lot of computing power and the resulting object is almost 40 MB large.

If you want to receive the corresponding nodes of the Konrad stations, search with find(OBJ, node(tags(v %agrep% “Konrad”))), the matching node IDs of the Konrad nodes and build an appropriate subset of the osmar object with subset(OBJ, node_ids = konrad_id) that only contains the Konrad node.

Osm parsen mit xpath

A more efficient method for our application is an xpath query. Xpath is a query language to get information from an xml object. The xml package provides a function environment in R.

With xmlparse (“Kassel.osm”) the document is read into R, next step is to send a query for the latitudes and longitudes and write the result directly into a data.frame.

stations <- data.frame(lat = xpathSApply(osm, path = “//node[tag[@v = ‘Konrad’]]/@lat”),

lon = xpathSApply(osm, path = “//node[tag[@v = ‘Konrad’]]/@lon”),

stringsAsFactors = FALSE)

We pass an xpath command as argument to path in order to search which nodes there are with a tag node which contains an attribute v, of which the value is Konrad and obtain an output value of the attribute lat from this node.

Ausschnitt aus dem XML-Dokument

XML-Document

We add the Konrad stations as White dots with points(stations$lon, stations$lat, col = adjustcolor(“white”,0.4),pch = 19)
to the already existing plot and adjust the transparency of the dots with adjustcolor(“white”,0.4).

Finally, we have to add a reference.

legend(par(“usr”)[1],par(“usr”)[3], pch = 19, col = adjustcolor(“white”,0.4), legend = “KonradnStations”, bty = “n”,

xjust = -5.1, yjust = -4)

Finally, we have to add a reference.

mtext(“Sources:ndatamarket.comnhttp://arnulf.us/PLZnopenstreetmap.de”, side = 1, line = 1, outer = TRUE, cex = 0.7, col = “darkgrey”)

Use the mtext function to write text into the margins of a plot side = 1
describes the bottom side of the plot, line = 1
means the second line (the function begins counting at 0) outer = TRUE
writes the text to the outer margin, finally cex and col define the font size and font color.

The code and the corresponding data are available at GitHub.

Spatial Visualisation with R

Spatial Visualisation with R

The visualization of analytical results as well as the creation of high quality and interactive graphics with R is one of the subjects of the training program of the eoda R Academy.

To leave a comment for the author, please follow the link and comment on their blog: eoda, R und Datenanalyse » eoda english R news.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)