On the eve of triggering Article 50, I think it’s semi-fitting to revisit the Brexit results.
What’s the topic?
The 2016 Brexit catastrophy (a.k.a. United Kingdom European Union membership referendum, 2016), specifically the visualisation of the results on a map, with each area (Local Authority in this case) proportioned by the number of votes, in the hope that it could capture the variety of sentiment on the topic across the country. The use of Cartogram should be an obvious choice in distorting the map areas, and while there have been plenty visualisation of such kind for General Elections, it has not been done for the Brexit.
Is there anything new I’m getting?
After so much has been done on visualising the voting result, there doesn’t seem to be a map that represent the split based on the number of votes for each region. I thought I’d combine the following to create just that with R:
- Shapefile – a ubiquitous geospatial vector data format. I shall demonstrate the ease of manipulating the file with R.
- Cartogram – a mapping distortion technique for representing geographic data in a striking way.
- Hexagon – a type of mapping distortion focused on representing geographic units with a uniform area size.
What are the benefits, if any?
Hopefully I can bring the UK Hexagon Shapefiles to more people’s attention. After some searching I have only found a couple of sources for UK hexagon maps (Parliamentary Constituencies and Local Authorities along with a tool to create them) published by Esri (while there are plenty for the US, e.g. the
tilegramsRpackage gives you several beautiful options). The votes data is aggregated to the Local Authority level, so the corresponding shapefile shall be used.
I will also demonstrate one way of manipulating the shapefile, i.e. creating new features (polygons) and reconstruct the file. Hopefully this can give you a better idea of the file format, especially if you are new to it.
Hexagon maps are great for simplifying complicated geo-shapes while preserving all relevant information. Cartogram is a great way to emphasise data features on a map. However in the case of the UK, there are close to 400 local authorities, plotting them all on the with their distorted irregular shapes would not be so visually pleasant or informative. When the starting point is a hexagon map – a specific type of cartogram, however, the features are laid out in a regular grid, and after distortion it is much easier on the eyes.
Leaflet is great for investigating data in detail. Faced with a map of a crowded bunch of colour shapes, one’s first instinct (especially if you are familiar with the UK geography) would be to work out which shape respresents which Local Authority, and the only way is of course an interactive map that can show you the name and the vote counts of a shape.
Get down to the business
The votes data
csv_url <- 'http://www.electoralcommission.org.uk/__data/assets/file/0014/212135/EU-referendum-result-data.csv' # For reproducibility the CSV is also downloaded in the data/ folder # download.file(url = csv_url, destfile = paste0('./data/', regmatches(csv_url, regexpr('[^/]+$', csv_url)))) raw_votes <- read.csv(csv_url, stringsAsFactors = F) # Create names_votes to match against the shape file later names_votes <- raw_votes$Area
So there are 382 Local Authorities. Let’s get the appropriate shapefile, and check out what data is already attached:
The hexagon shapefile
pacman::p_load(rgdal, dplyr) la_shp <- readOGR(dsn = './data/GB_Hex_Cartogram_LAs', layer = "GB_Hex_Cartogram_LAs")
## OGR data source with driver: ESRI Shapefile ## Source: "./data/GB_Hex_Cartogram_LAs", layer: "GB_Hex_Cartogram_LAs" ## with 380 features ## It has 5 fields ## Integer64 fields read as strings: OBJECTID
names_shp <- as.character(la_shp$LAD12NM) # for matching the CSV file later head([email protected])
## OBJECTID LAD12CD LAD12NM Shape_Leng Shape_Area ## 0 1 E07000205 Suffolk Coastal 127685.1 1176601978 ## 1 2 E07000202 Ipswich 127686.2 1176622756 ## 2 3 E07000076 Tendring 127685.7 1176613311 ## 3 4 E07000145 Great Yarmouth 127685.8 1176614926 ## 4 5 E07000206 Waveney 127685.4 1176607876 ## 5 6 E07000203 Mid Suffolk 127686.5 1176628446
Let’s plot it against an actual map to get a sense of it. To do that let’s project it to WGS84 (
EPSG:4326) measured in degrees of latitude / longitude, which also allows to set up the bouding box easier.
Note: using ggplot to overlay polygons on a map requires turning the shapefile into a data frame by using
broom::tidy(). After running the function if you encounter this error
Error: isTRUE(gpclibPermitStatus()) is not TRUE on Windows, make sure Rtools is installed, then install package
install.packages('gpclib', type = 'source'). To verify everything is fine, run
gpclibPermit(), if both return
True then you are all set.
pacman::p_load(broom, ggplot2, ggmap) # Re-project the map into the WGS (World Geodetic System, i.e. longitudes and latitudes) la_84 <- spTransform(la_shp, CRS('+init=epsg:4326')) la_84_df <- la_84 %>% tidy(region = "LAD12NM") b <- bbox(la_84) basemap <- ggmap( get_map(location = b, source = "stamen", maptype = "watercolor", crop = T)) basemap + geom_polygon( data = la_84_df, aes(x = long, y = lat, group = group), fill = 'darkorchid', alpha = 0.7) + xlab('Longitude') + ylab('Latitude')
So the hexagon map is roughly based on the actual coordicates of the features.
But wait, the Local Authority shapefile has a flaw – it is missing Northern Ireland and Gibraltar, confirmed by comparing the names from the CSV and the shapefile (The Vale of Glamorgan also has different names in the two files, let’s deal with that later)
##  "Northern Ireland" "Gibraltar" "Vale of Glamorgan"
##  "The Vale of Glamorgan"
Let’s create them!
Create features (polygons) in a shapefile
Manipulating the shapefile can be a daunting task, as it has multiple layers of nested elements like this:
str(la_84, max.level = 2)
## Formal class 'SpatialPolygonsDataFrame' [package "sp"] with 5 slots ## [email protected] data :'data.frame': 380 obs. of 5 variables: ## [email protected] polygons :List of 380 ## .. .. [list output truncated] ## [email protected] plotOrder : int [1:380] 185 332 375 192 176 84 14 71 250 102 ... ## [email protected] bbox : num [1:2, 1:2] -6.46 50.26 2.05 57.32 ## .. ..- attr(*, "dimnames")=List of 2 ## [email protected] proj4string:Formal class 'CRS' [package "sp"] with 1 slot
And inside the
@ polygons element are all the features (points, lines and polygons), for each of them there is this:
str([email protected][], max.level = 2)
## Formal class 'Polygons' [package "sp"] with 5 slots ## [email protected] Polygons :List of 1 ## [email protected] plotOrder: int 1 ## [email protected] labpt : num [1:2] 1.86 53.2 ## [email protected] ID : chr "0" ## [email protected] area : num 0.0569
Now that the structure is clearly laid out, this is what we need to do
- create a new polygon by copying from one of the existing ones
- work out the coordinates for the new polygon
- change the coordinates of the polygon
- append the polygon with the existing ones
- reconstruct the
- recreate the shapefiles with the polygons list and other values from the original shapefile
The plan is to create NI and GI features by copying one of the polygons near by, shift it to an appropriate location, and update the attributes and data to reflect NI and GI respectively. So I decided using the British National Grid projection (
EPSG:27700), as measured in meters, would be a better choice, as the shift can be explicit and more understandable.
Barrow-in-Furness is a good choice as a neighbouring polygon. NI is esitmated to be 100,000 meters to the west, and GI is 600,000 meters to the south of NI. (the step numbers in the code correspond to the above five steps)
la_bng <- spTransform(la_shp, CRS('+init=epsg:27700')) # 1. ind_ref <- which(la_bng$LAD12NM == 'Barrow-in-Furness') ni <- [email protected][[ind_ref]] ni_coors <- [email protected][]@coords # 2. # NI seems around 150,000 meters to the west # GI 500,000 meters to the south ni_shiftx <- -100000 gi_shifty <- -600000 # Shape coordinates ni_coors[, 1] <- ni_coors[, 1] + ni_shiftx gi_coors <- ni_coors gi_coors[, 2] <- ni_coors[, 2] + gi_shifty # Labpt coordinates ni_labpt_x <- [email protected] - ni_shiftx ni_labpt_y <- [email protected] ni_labpt <- c(ni_labpt_x, ni_labpt_y) gi_labpt_x <- [email protected] + ni_shiftx gi_labpt_y <- [email protected] + gi_shifty gi_labpt <- c(gi_labpt_x, gi_labpt_y) # 3. # Replace coordiates and ID in ni and gi ni_poly <- ni gi_poly <- ni [email protected][]@coords <- ni_coors [email protected][]@coords <- gi_coors [email protected][]@labpt <- ni_labpt [email protected][]@labpt <- gi_labpt [email protected] <- ni_labpt [email protected] <- gi_labpt new_poly_ids <- c('380', '381') [email protected] <- new_poly_ids [email protected] <- new_poly_ids # 4. # Construct the new SpatialPolygons la_bng_polygons <- c([email protected], ni_poly, gi_poly) # 5. # Recreate the polygon list la_bng_spgs <- SpatialPolygons(la_bng_polygons) # 6. # Construct the new SpatialPolygonsDataFrame proj4string(la_bng_spgs) <- proj4string(la_bng) # Create the two extra data points for the dataframe ni_lad12cn <- raw_votes[raw_votes$Area == 'Northern Ireland', 'Area_Code'] gi_lad12cn <- raw_votes[raw_votes$Area == 'Gibraltar', 'Area_Code'] new_data <- data.frame( OBJECTID = c('380', '381'), LAD12CD = c(ni_lad12cn, gi_lad12cn), LAD12NM = c('Northern Ireland', 'Gibraltar'), Shape_Leng = mean(la_bng$Shape_Leng), Shape_Area = mean(la_bng$Shape_Area)) la_new_data <- rbind([email protected], new_data) # Make sure the row IDs of the new rows coincide the external dataframe we will join later rownames(la_new_data)[(nrow(la_new_data) - 1):nrow(la_new_data)] <- new_poly_ids la_bng_new <- SpatialPolygonsDataFrame(la_bng_spgs, la_new_data)
Let’s check that the polygons have indeed been created:
la_84_new <- spTransform(la_bng_new, CRS('+init=epsg:4326')) la_84_new_df <- la_84_new %>% tidy(region = "LAD12NM") ggmap( get_map(location = bbox(la_84_new), source = "stamen", maptype = "watercolor", crop = T)) + geom_polygon( data = la_84_new_df, aes(x = long, y = lat, group = group), fill = 'darkorchid', alpha = 0.7) + xlab('Longitude') + ylab('Latitude')
Much better. Even though they do not exactly fall inside the geospatial boundaries of the actual areas, it’s good enough for the hexagon map.
Before moving on, let’s fix the little issue with The Vale of Glamorgan:
raw_votes$Area[grepl('glamorgan', raw_votes$Area, ignore.case = T)] <- 'The Vale of Glamorgan'
Voting results on a hexagon map
Prior to this point we have been working with the voting data and the shapefile separately, now it’s time to combine the voting data and the map:
la_bng_new <- la_bng_new %>% merge(raw_votes, by.x = 'LAD12NM', by.y = 'Area') la_bng_new$Result <- ifelse(la_bng_new$Remain > la_bng_new$Leave, 'Remain', 'Leave') %>% as.factor
Then we can plot the hexagon:
cols <- c('#ff6666', '#66ccff') plot(la_bng_new, col = cols[la_bng_new$Result], main = 'EU Referendum 2016 Results') legend('topright', legend = c('Leave', 'Remain'), bty = 'n', fill = cols)
As one would expect, the voting results are clear (direction-wise) for each Local Authority, and we are indeed seeing London the Remain capital and the surrounding areas being more prominently represented, but the map is still mis-representing the relative valid votes of each area, i.e. regardless of the number of valid votes, all areas are of the same size.
Let’s try rescaling the map using the
Valid_Votes inside the votes data.
There are various options for creating a cartogram map, notably
getcartr packages have been used together (e.g. Michael Höhle’s nice post). The downside of those is that the installation process can be complicated (see this Stack Overflow answer for a complete process). for a simple solution the
cartogram package on CRAN is a good alternative:
# Plot cartogram pacman::p_load(cartogram) carto <- cartogram(la_bng_new, 'Valid_Votes') carto_df <- carto %>% tidy(region = 'LAD12NM') carto_df <- carto_df %>% left_join( data.frame(id = as.character(la_bng_new$LAD12NM), Result = as.factor(la_bng_new$Result), stringsAsFactors = F)) ggplot() + theme_bw() + theme_nothing(legend = TRUE) + coord_fixed() + geom_polygon(data = carto_df, colour = 'gray40', aes(x = long, y = lat, group = id, fill = Result)) + ggtitle('EU Referendum 2016 Results Cartogram')
This makes slightly more sense, but there are still a couple issues:
The proportion of Leave and Remain varies across regions wildly:
pacman::p_load(ggplot2) ggplot(data = raw_votes, aes(x = Pct_Remain, fill = Region, col = Region)) + geom_density(alpha = 0.5) + theme_bw() + xlim(0, 100) + ylim(0, 0.09) + geom_vline(aes(xintercept = 50)) + facet_wrap(~Region) + theme(legend.position="none") + xlab('Percentage of Remain votes') + ylab('Distribution density') + ggtitle('Remain vote distribution by Region')
Note, for Northern Ireland there is only one data point (@ 55.8%), hence the invisible verticle line in the density graph.
Therefore we must not lose the information on the map.
Further more, aren’t you just dying to work out which area is which?
Let’s get straight down to
leaflet to address all of the above.
Plotting cartogram with
Before plotting the map we need to take care of a few things:
More dramatic scaling
The above cartogram looks fine, it probably represent the valide votes fairly accurately. But for aesthetic purposes, I think a more dramatic scaling would be more interesting, this means the difference of polygons are exaggerated and some areas are over shaddowing others in size, but that would make the visualisation more telling, and easier to grab attention, which is a major point of data visualisation. So instead of using
Valid_Votes we use the square of it.
Colouring for votes percentage
There is a wide range of votes percentage (% of Remain from 24% to 96%), This should be differentiated by colours. The last thing we need to take care off is a small tweak on the scaler for Gibraltar. Gibraltar being a British overseas territory (with a population of just over 30,000, located at the bottom of Spain) naturally voted overwhelmingly Remain (yes, that one with 96%). Allowing the outlier to be part of the colour palette would push everything close together hence showing less disguishable colours, and for this reason, we are going to make the outlier value the same as the second highest (Lamberth at 79%, and by now you probably know where I stand on the McCandless vs. Tufte debate).
pacman::p_load(leaflet) cols1 <- c("#ad0037", "#004080") # Transforming the scaling factor Valid_Votes la_bng_new$Valid_Votes_sq <- la_bng_new$Valid_Votes ^ 2 carto_x <- cartogram(la_bng_new, 'Valid_Votes_sq') # Create vote percentage colour scale carto_x$Pct_Remain_2 <- carto_x$Pct_Remain carto_x$Pct_Remain_2[carto_x$LAD12NM == 'Gibraltar'] = carto_x$Pct_Remain_2[carto_x$LAD12NM == 'Lambeth'] carto_x$fill_op <- abs(carto_x$Pct_Remain_2 - 50) / max(abs(carto_x$Pct_Remain_2 - 50)) # Create the vote result HTML tag labels2 <- sprintf( '<strong>%s</strong><br/>Remain: %s (%.0f%%)<br/>Leave: %s (%.0f%%)', carto$LAD12NM, prettyNum(carto$Remain, big.mark = ','), 100 * carto$Remain / carto$Valid_Votes, prettyNum(carto$Leave, big.mark = ','), 100 * carto$Leave / carto$Valid_Votes ) %>% lapply(htmltools::HTML) # Leaflet CRS object for WGS84 projection epsg4326 <- leafletCRS( crsClass = 'L.Proj.CRS', code = 'EPSG:4326', proj4def = '+init=epsg:4326 +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0', resolutions = 2^(16:7)) pal <- colorFactor( palette = cols1, domain = carto_x$Result ) # Leaflet map leaflet(carto_x, options = leafletOptions(crs = epsg4326)) %>% addPolygons( color = "#444444", weight = 1, smoothFactor = 0.5, label = labels2, opacity = 1.0, fillOpacity = carto_x$fill_op, fillColor = cols1[as.factor(carto_x$Result)], highlightOptions = highlightOptions(color = "white", weight = 2, bringToFront = TRUE)) %>% addLegend("topright", pal = pal, values = ~Result, title = "Voting result", opacity = 1)