Visualising Brexit Votes with Leaflet and cartogram

March 26, 2017
By

(This article was first published on Xinye Fiddles, and kindly contributed to R-bloggers)





On the eve of triggering Article 50, I think it’s semi-fitting to revisit the Brexit results.

What’s the topic?

The 2016 Brexit catastrophy (a.k.a. United Kingdom European Union membership referendum, 2016), specifically the visualisation of the results on a map, with each area (Local Authority in this case) proportioned by the number of votes, in the hope that it could capture the variety of sentiment on the topic across the country. The use of Cartogram should be an obvious choice in distorting the map areas, and while there have been plenty visualisation of such kind for General Elections, it has not been done for the Brexit.

Is there anything new I’m getting?

After so much has been done on visualising the voting result, there doesn’t seem to be a map that represent the split based on the number of votes for each region. I thought I’d combine the following to create just that with R:

  • Shapefile – a ubiquitous geospatial vector data format. I shall demonstrate the ease of manipulating the file with R.
  • Cartogram – a mapping distortion technique for representing geographic data in a striking way.
  • Hexagon – a type of mapping distortion focused on representing geographic units with a uniform area size.
  • Leaflet – a modern and popular Javascript package for visualsation in the browser.

What are the benefits, if any?

  • Hopefully I can bring the UK Hexagon Shapefiles to more people’s attention. After some searching I have only found a couple of sources for UK hexagon maps (Parliamentary Constituencies and Local Authorities along with a tool to create them) published by Esri (while there are plenty for the US, e.g. the tilegramsR package gives you several beautiful options). The votes data is aggregated to the Local Authority level, so the corresponding shapefile shall be used.

  • I will also demonstrate one way of manipulating the shapefile, i.e. creating new features (polygons) and reconstruct the file. Hopefully this can give you a better idea of the file format, especially if you are new to it.

  • Hexagon maps are great for simplifying complicated geo-shapes while preserving all relevant information. Cartogram is a great way to emphasise data features on a map. However in the case of the UK, there are close to 400 local authorities, plotting them all on the with their distorted irregular shapes would not be so visually pleasant or informative. When the starting point is a hexagon map – a specific type of cartogram, however, the features are laid out in a regular grid, and after distortion it is much easier on the eyes.

  • Leaflet is great for investigating data in detail. Faced with a map of a crowded bunch of colour shapes, one’s first instinct (especially if you are familiar with the UK geography) would be to work out which shape respresents which Local Authority, and the only way is of course an interactive map that can show you the name and the vote counts of a shape.

Get down to the business

The votes data

The data source is The Electoral Commssion and the CSV can be downloaded directly from here.

csv_url <- 'http://www.electoralcommission.org.uk/__data/assets/file/0014/212135/EU-referendum-result-data.csv'

# For reproducibility the CSV is also downloaded in the data/ folder
# download.file(url = csv_url, destfile = paste0('./data/', regmatches(csv_url, regexpr('[^/]+$', csv_url))))

raw_votes <- read.csv(csv_url, stringsAsFactors = F)

# Create names_votes to match against the shape file later
names_votes <- raw_votes$Area

So there are 382 Local Authorities. Let’s get the appropriate shapefile, and check out what data is already attached:

The hexagon shapefile

pacman::p_load(rgdal, dplyr)
la_shp <- readOGR(dsn = './data/GB_Hex_Cartogram_LAs', layer = "GB_Hex_Cartogram_LAs")
## OGR data source with driver: ESRI Shapefile 
## Source: "./data/GB_Hex_Cartogram_LAs", layer: "GB_Hex_Cartogram_LAs"
## with 380 features
## It has 5 fields
## Integer64 fields read as strings:  OBJECTID
names_shp <- as.character(la_shp$LAD12NM) # for matching the CSV file later
head([email protected])
##   OBJECTID   LAD12CD         LAD12NM Shape_Leng Shape_Area
## 0        1 E07000205 Suffolk Coastal   127685.1 1176601978
## 1        2 E07000202         Ipswich   127686.2 1176622756
## 2        3 E07000076        Tendring   127685.7 1176613311
## 3        4 E07000145  Great Yarmouth   127685.8 1176614926
## 4        5 E07000206         Waveney   127685.4 1176607876
## 5        6 E07000203     Mid Suffolk   127686.5 1176628446

Let’s plot it against an actual map to get a sense of it. To do that let’s project it to WGS84 (EPSG:4326) measured in degrees of latitude / longitude, which also allows to set up the bouding box easier.

Note: using ggplot to overlay polygons on a map requires turning the shapefile into a data frame by using broom::tidy(). After running the function if you encounter this error Error: isTRUE(gpclibPermitStatus()) is not TRUE on Windows, make sure Rtools is installed, then install package gpclib using install.packages('gpclib', type = 'source'). To verify everything is fine, run gpclibPermitStatus() and gpclibPermit(), if both return True then you are all set.

pacman::p_load(broom, ggplot2, ggmap)

# Re-project the map into the WGS (World Geodetic System, i.e. longitudes and latitudes)
la_84 <- spTransform(la_shp, CRS('+init=epsg:4326'))
la_84_df <- la_84 %>% tidy(region = "LAD12NM")
b <- bbox(la_84)
basemap <- ggmap(
  get_map(location = b,
          source = "stamen",
          maptype = "watercolor",
          crop = T))
basemap + geom_polygon(
  data = la_84_df,
  aes(x = long, y = lat, group = group),
  fill = 'darkorchid',
  alpha = 0.7) + xlab('Longitude') + ylab('Latitude')

So the hexagon map is roughly based on the actual coordicates of the features.

But wait, the Local Authority shapefile has a flaw – it is missing Northern Ireland and Gibraltar, confirmed by comparing the names from the CSV and the shapefile (The Vale of Glamorgan also has different names in the two files, let’s deal with that later)

setdiff(names_votes, names_shp)
## [1] "Northern Ireland"  "Gibraltar"         "Vale of Glamorgan"
setdiff(names_shp, names_votes)
## [1] "The Vale of Glamorgan"

Let’s create them!

Create features (polygons) in a shapefile

Manipulating the shapefile can be a daunting task, as it has multiple layers of nested elements like this:

str(la_84, max.level = 2)
## Formal class 'SpatialPolygonsDataFrame' [package "sp"] with 5 slots
##   [email protected] data       :'data.frame':  380 obs. of  5 variables:
##   [email protected] polygons   :List of 380
##   .. .. [list output truncated]
##   [email protected] plotOrder  : int [1:380] 185 332 375 192 176 84 14 71 250 102 ...
##   [email protected] bbox       : num [1:2, 1:2] -6.46 50.26 2.05 57.32
##   .. ..- attr(*, "dimnames")=List of 2
##   [email protected] proj4string:Formal class 'CRS' [package "sp"] with 1 slot

And inside the @ polygons element are all the features (points, lines and polygons), for each of them there is this:

str([email protected][[1]], max.level = 2)
## Formal class 'Polygons' [package "sp"] with 5 slots
##   [email protected] Polygons :List of 1
##   [email protected] plotOrder: int 1
##   [email protected] labpt    : num [1:2] 1.86 53.2
##   [email protected] ID       : chr "0"
##   [email protected] area     : num 0.0569

Now that the structure is clearly laid out, this is what we need to do

  1. create a new polygon by copying from one of the existing ones
  2. work out the coordinates for the new polygon
  3. change the coordinates of the polygon
  4. append the polygon with the existing ones
  5. reconstruct the @ polygons list
  6. recreate the shapefiles with the polygons list and other values from the original shapefile

The plan is to create NI and GI features by copying one of the polygons near by, shift it to an appropriate location, and update the attributes and data to reflect NI and GI respectively. So I decided using the British National Grid projection (EPSG:27700), as measured in meters, would be a better choice, as the shift can be explicit and more understandable.

Barrow-in-Furness is a good choice as a neighbouring polygon. NI is esitmated to be 100,000 meters to the west, and GI is 600,000 meters to the south of NI. (the step numbers in the code correspond to the above five steps)

la_bng <- spTransform(la_shp, CRS('+init=epsg:27700'))


# 1.
ind_ref <- which(la_bng$LAD12NM == 'Barrow-in-Furness')
ni <- [email protected][[ind_ref]]
ni_coors <- [email protected][[1]]@coords


# 2.
# NI seems around 150,000 meters to the west
# GI 500,000 meters to the south
ni_shiftx <- -100000
gi_shifty <- -600000

# Shape coordinates
ni_coors[, 1] <- ni_coors[, 1] + ni_shiftx
gi_coors <- ni_coors
gi_coors[, 2] <- ni_coors[, 2] + gi_shifty

# Labpt coordinates
ni_labpt_x <- [email protected][1] - ni_shiftx
ni_labpt_y <- [email protected][2]
ni_labpt <- c(ni_labpt_x, ni_labpt_y)

gi_labpt_x <- [email protected][1] + ni_shiftx
gi_labpt_y <- [email protected][2] + gi_shifty
gi_labpt <- c(gi_labpt_x, gi_labpt_y)

# 3.
# Replace coordiates and ID in ni and gi
ni_poly <- ni
gi_poly <- ni
[email protected][[1]]@coords <- ni_coors
[email protected][[1]]@coords <- gi_coors
[email protected][[1]]@labpt <- ni_labpt
[email protected][[1]]@labpt <- gi_labpt
[email protected] <- ni_labpt
[email protected] <- gi_labpt
new_poly_ids <- c('380', '381')
[email protected] <- new_poly_ids[1]
[email protected] <- new_poly_ids[2]


# 4.
# Construct the new SpatialPolygons
la_bng_polygons <- c([email protected], ni_poly, gi_poly)


# 5.
# Recreate the polygon list
la_bng_spgs <- SpatialPolygons(la_bng_polygons)


# 6.
# Construct the new SpatialPolygonsDataFrame
proj4string(la_bng_spgs) <- proj4string(la_bng)

# Create the two extra data points for the dataframe
ni_lad12cn <- raw_votes[raw_votes$Area == 'Northern Ireland', 'Area_Code']
gi_lad12cn <- raw_votes[raw_votes$Area == 'Gibraltar', 'Area_Code']
new_data <- data.frame(
  OBJECTID = c('380', '381'),
  LAD12CD = c(ni_lad12cn, gi_lad12cn),
  LAD12NM = c('Northern Ireland', 'Gibraltar'),
  Shape_Leng = mean(la_bng$Shape_Leng),
  Shape_Area = mean(la_bng$Shape_Area))
la_new_data <- rbind([email protected], new_data)

# Make sure the row IDs of the new rows coincide the external dataframe we will join later
rownames(la_new_data)[(nrow(la_new_data) - 1):nrow(la_new_data)] <- new_poly_ids
la_bng_new <- SpatialPolygonsDataFrame(la_bng_spgs, la_new_data)

Let’s check that the polygons have indeed been created:

la_84_new <- spTransform(la_bng_new, CRS('+init=epsg:4326'))
la_84_new_df <- la_84_new %>% tidy(region = "LAD12NM")
ggmap(
  get_map(location = bbox(la_84_new),
          source = "stamen",
          maptype = "watercolor",
          crop = T)) +
  geom_polygon(
    data = la_84_new_df,
    aes(x = long, y = lat, group = group),
    fill = 'darkorchid',
    alpha = 0.7) + xlab('Longitude') + ylab('Latitude')

Much better. Even though they do not exactly fall inside the geospatial boundaries of the actual areas, it’s good enough for the hexagon map.

Before moving on, let’s fix the little issue with The Vale of Glamorgan:

raw_votes$Area[grepl('glamorgan', raw_votes$Area, ignore.case = T)] <- 'The Vale of Glamorgan'

Voting results on a hexagon map

Prior to this point we have been working with the voting data and the shapefile separately, now it’s time to combine the voting data and the map:

la_bng_new <- la_bng_new %>% merge(raw_votes, by.x = 'LAD12NM', by.y = 'Area')
la_bng_new$Result <- ifelse(la_bng_new$Remain > la_bng_new$Leave, 'Remain', 'Leave') %>% as.factor

Then we can plot the hexagon:

cols <- c('#ff6666', '#66ccff')
plot(la_bng_new, col = cols[la_bng_new$Result], main = 'EU Referendum 2016 Results')
legend('topright', legend = c('Leave', 'Remain'), bty = 'n', fill = cols)

As one would expect, the voting results are clear (direction-wise) for each Local Authority, and we are indeed seeing London the Remain capital and the surrounding areas being more prominently represented, but the map is still mis-representing the relative valid votes of each area, i.e. regardless of the number of valid votes, all areas are of the same size.

Let’s try rescaling the map using the Valid_Votes inside the votes data.

There are various options for creating a cartogram map, notably Rcartogram and getcartr packages have been used together (e.g. Michael Höhle’s nice post). The downside of those is that the installation process can be complicated (see this Stack Overflow answer for a complete process). for a simple solution the cartogram package on CRAN is a good alternative:

# Plot cartogram
pacman::p_load(cartogram)
carto <- cartogram(la_bng_new, 'Valid_Votes')
carto_df <- carto %>% tidy(region = 'LAD12NM')
carto_df <- carto_df %>%
  left_join(
    data.frame(id = as.character(la_bng_new$LAD12NM),
               Result = as.factor(la_bng_new$Result),
               stringsAsFactors = F))

ggplot() + theme_bw() + theme_nothing(legend = TRUE) + coord_fixed() +
  geom_polygon(data = carto_df, colour = 'gray40',
               aes(x = long, y = lat, group = id,
                   fill = Result)) +
  ggtitle('EU Referendum 2016 Results Cartogram')

This makes slightly more sense, but there are still a couple issues:

The proportion of Leave and Remain varies across regions wildly:

pacman::p_load(ggplot2)
ggplot(data = raw_votes,
       aes(x = Pct_Remain, fill = Region, col = Region)) +
  geom_density(alpha = 0.5) + theme_bw() +
  xlim(0, 100) + ylim(0, 0.09) +
  geom_vline(aes(xintercept = 50)) +
  facet_wrap(~Region) + theme(legend.position="none") +
  xlab('Percentage of Remain votes') +
  ylab('Distribution density') +
  ggtitle('Remain vote distribution by Region')

Note, for Northern Ireland there is only one data point (@ 55.8%), hence the invisible verticle line in the density graph.

Therefore we must not lose the information on the map.

Further more, aren’t you just dying to work out which area is which?

Let’s get straight down to leaflet to address all of the above.

Plotting cartogram with leaflet

Before plotting the map we need to take care of a few things:

More dramatic scaling

The above cartogram looks fine, it probably represent the valide votes fairly accurately. But for aesthetic purposes, I think a more dramatic scaling would be more interesting, this means the difference of polygons are exaggerated and some areas are over shaddowing others in size, but that would make the visualisation more telling, and easier to grab attention, which is a major point of data visualisation. So instead of using Valid_Votes we use the square of it.

Information tag

Taking advantage of Javascript we can include much information dynamically, e.g. showing the votes when the mouse cursor hovers on the polygons. These need to be created as separate HTML tags.

Colouring for votes percentage

There is a wide range of votes percentage (% of Remain from 24% to 96%), This should be differentiated by colours. The last thing we need to take care off is a small tweak on the scaler for Gibraltar. Gibraltar being a British overseas territory (with a population of just over 30,000, located at the bottom of Spain) naturally voted overwhelmingly Remain (yes, that one with 96%). Allowing the outlier to be part of the colour palette would push everything close together hence showing less disguishable colours, and for this reason, we are going to make the outlier value the same as the second highest (Lamberth at 79%, and by now you probably know where I stand on the McCandless vs. Tufte debate).

pacman::p_load(leaflet)
cols1 <- c("#ad0037", "#004080")
# Transforming the scaling factor Valid_Votes
la_bng_new$Valid_Votes_sq <- la_bng_new$Valid_Votes ^ 2
carto_x <- cartogram(la_bng_new, 'Valid_Votes_sq')

# Create vote percentage colour scale
carto_x$Pct_Remain_2 <- carto_x$Pct_Remain
carto_x$Pct_Remain_2[carto_x$LAD12NM == 'Gibraltar'] = carto_x$Pct_Remain_2[carto_x$LAD12NM == 'Lambeth']
carto_x$fill_op <- abs(carto_x$Pct_Remain_2 - 50) / max(abs(carto_x$Pct_Remain_2 - 50))

# Create the vote result HTML tag
labels2 <- sprintf(
  '%s
Remain: %s (%.0f%%)
Leave: %s (%.0f%%)', carto$LAD12NM, prettyNum(carto$Remain, big.mark = ','), 100 * carto$Remain / carto$Valid_Votes, prettyNum(carto$Leave, big.mark = ','), 100 * carto$Leave / carto$Valid_Votes ) %>% lapply(htmltools::HTML) # Leaflet CRS object for WGS84 projection epsg4326 <- leafletCRS( crsClass = 'L.Proj.CRS', code = 'EPSG:4326', proj4def = '+init=epsg:4326 +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0', resolutions = 2^(16:7)) pal <- colorFactor( palette = cols1, domain = carto_x$Result ) # Leaflet map leaflet(carto_x, options = leafletOptions(crs = epsg4326)) %>% addPolygons( color = "#444444", weight = 1, smoothFactor = 0.5, label = labels2, opacity = 1.0, fillOpacity = carto_x$fill_op, fillColor = cols1[as.factor(carto_x$Result)], highlightOptions = highlightOptions(color = "white", weight = 2, bringToFront = TRUE)) %>% addLegend("topright", pal = pal, values = ~Result, title = "Voting result", opacity = 1)

That is not too bad. This makes it easy for us to dig into some insight:

  1. Most of the London LAs voted overwhelmingly Remain (as we have learned previously)
  2. The largest contributing Leave LAs (in total votes size and proportion) are around Wakefield, Doncaster or Yorkshire, and Dudley, Sandwell and Walsall of the West Midlands.
  3. Oxbridge clearly stands out with a clear opinion about Brexit.
  4. Northern Ireland and Gibraltar both have distinctive characteristics (NI having one of the highest vote counts and GI features the highest Remain votes), so they are well worth the effort of creating the polygons.
  5. Among the largest LAs in valid votes (>= 140,000, ~top quartile), the biggest swinging (\(Remain <= 35\%\) or \(>= 65\%\)) LAs can been spotted easily on the map, but also summarised as follows:
pacman::p_load(knitr)
top_las <- data.frame(
  LA = [email protected]$LAD12NM,
  Valid_Votes = [email protected]$Valid_Votes,
  Pct_Remain = [email protected]$Pct_Remain,
  Result = [email protected]$Result) %>%
  filter(Valid_Votes >= 140000) %>%
  arrange(desc(Pct_Remain)) %>%
  filter(abs(Pct_Remain - 50) >= 15)
top_las$Valid_Votes <- formatC(top_las$Valid_Votes, format = 'd', big.mark = ',')
top_las$Pct_Remain <- formatC(top_las$Pct_Remain, format = 'f', digits = 1)
print(kable(top_las, format = 'html',
      col.names = c('Local Authority', 'Valid Votes', '% of Remain', 'Result'),
      align = 'lccc',
      pad = 1))
Local Authority Valid Votes % of Remain Result
Lambeth 141,924 78.6 Remain
Wandsworth 157,884 75.0 Remain
City of Edinburgh 252,294 74.4 Remain
Brighton and Hove 146,675 68.6 Remain
Glasgow City 252,809 66.6 Remain
Wakefield 175,042 33.6 Leave
Sandwell 147,254 33.3 Leave
Dudley 175,226 32.4 Leave
Doncaster 151,182 31.0 Leave

Why don’t we plot a portable ggplot for the road

# Before converting shapefile to a data frame, let's tweak some of the variables
# Rescaling Pct_Remain
[email protected]$Pct_Remain_2 <- [email protected]$Pct_Remain
[email protected]$Pct_Remain_2[[email protected]$LAD12NM == 'Gibraltar'] <- [email protected]$Pct_Remain_2[[email protected]$LAD12NM == 'Lambeth']

# Create the banding of Remain percentage
col_vals <- [email protected]$Pct_Remain_2
min_pct <- 50 - max(abs(col_vals - 50))
max_pct <- 50 + max(abs(col_vals - 50))
[email protected]$pct_banded <- cut(col_vals, seq(min_pct, max_pct, (max_pct - min_pct) / 11))

# Make the legend labels prettier
pct_labs <- as.character([email protected]$pct_banded)
# Include Gibraltar's figure back to the legend label, otherwise it would be misleading
pct_labs[pct_labs == '(73.4,78.6]'] <- '(73.4,95.9]'
pct_labs <- substr(pct_labs, 2, nchar(pct_labs) - 1)
tmp <- sapply(pct_labs, function(x) strsplit(x, split = ','))
pct_labs <- sapply(tmp,
  function(x) paste0(
    formatC(as.numeric(x[1]), digits = 1, format = 'f'), '% - ',
    formatC(as.numeric(x[2]), digits = 1, format = 'f'), '%'))
[email protected]$pct_banded <- as.factor(pct_labs)

# Merge the votes data back to the map data frame
carto_x_df <- carto_x %>% tidy(region = 'LAD12NM')
carto_x_df <- carto_x_df %>%
  left_join(
    data.frame(id = as.character([email protected]$LAD12NM),
               Result = as.factor([email protected]$Result),
               Pct_Remain = [email protected]$Pct_Remain_2,
               Pct_band = [email protected]$pct_banded,
               stringsAsFactors = F))

# Re-order the data frame so that legend on the map follow logical order
carto_x_df <- carto_x_df[order(carto_x_df$Pct_Remain), ]

# Plot
ggplot(data = carto_x_df, aes(x = long, y = lat, group = id)) +
  theme_bw() + coord_fixed() + 
  theme_nothing(legend = TRUE) +
  ggtitle('EU Referendum 2016 Results Hexagon Cartogram') +
  theme(plot.title = element_text(hjust = 0.5, size = rel(2))) +
  geom_polygon(aes(fill = Pct_band), colour = 'gray40', size = 0.2) +
  scale_fill_brewer(
    name = 'Remain votes',
    type = 'div', 
    palette = 'RdBu',
    breaks = carto_x_df$Pct_band,
    labels = carto_x_df$Pct_band)

There we have it. All the source code is in my repo. Feel free to comment / pick bones.

To leave a comment for the author, please follow the link and comment on their blog: Xinye Fiddles.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)