Convex hulls with <code>dplyr</code> and <code>ggplot2</code>

(This article was first published on R/Notes, and kindly contributed to R-bloggers)

This note shows a quick way to draw convex hulls, using dplyr and ggplot2.

Our example data is a dataset of European parliamentary constituencies, some of which have been successfully geocoded with the help of the ggmap package. The package taps into Google Maps to find approximates coordinates for addresses, which worked well for most constituencies after some light tweaking of their names.

You can get the data by running this script.

Assuming that you have loaded ggplot2, the data can be represented as a set of (sometimes duplicated) coordinates within each country. Some of the scatterplots below should be familiar to European readers, especially those for France and Italy:

Let’s now draw lines around the points of each country, i.e. convex hulls. R comes with a convex hull function that returns an ordered list of row numbers; the coordinates located on these rows are part of the convex hull.

For every country, let’s number the rows from 1 to n, the total number of rows. Let’s then encode these numbers as a factor, while setting the levels of that factor to the results of the convex hull function. Last, let’s order the data based on this new variable.

The dplyr package offers a simple way to perform all these operations:

The hull variable now contains either missing values on rows that are not in the convex hull, or numbers corresponding to the position of the row in the convex hull. Let’s send those specific rows to a polygon geometry, which will draw the convex hulls of each country, and overlay the full set of coordinates:

The result is correct only because we took the precaution of ordering the data according to the row numbers returned by the convex hull function. Try plotting the unordered data, and you will get a messy set of polygons that will not reflect the correct boundaries of the convex hulls.

The hulls of some countries, such as France, Italy, or Portugal, include some constituencies that are located overseas. In the case of Portugal, those constituencies are the Autonomous Region of the Azores and Madeira:


The code for this note appears in this Gist, along with the data, which might still contain some mistakes. Please leave a comment on the Gist if you find an error in the constituency geocodes.

To leave a comment for the author, please follow the link and comment on their blog: R/Notes.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)