I and @awpiii were trading news about the power outages in Maine & New Hampshire last night and he tweeted the link to the @PSNH Outage Map. As if the Bing Maps tiles weren’t bad enough, the use of a categorical color scale instead of a sequential one caused sufficient angst that I whipped up an alternate version in R between making pies and bread for Thanksgiving (even with power being out for us).
PSNH provides a text version of outages (by town) that ends up being a pretty clean HTML table, and a quick Google search led me to a fairly efficient town-level shapefile for New Hampshire. With these data files at the ready, it was time to make a better map.
Step 0 – Environment Setup
So, I lied. There are six steps. “5” just works way better in attention-grabbing list headlines. The first one is setting up the project and loading all the libraries we’ll need. I use RStudio for most of my R coding and their IDE has the concept of a “project” which has it’s own working directory, workspace, history, and source documents separate from any other RStudio windows you have open. They are a great way to organize your analyses and experiments. I have my own “new project” script that sets up additional directory structures, configures the
Rproj file with my preferences and initializes a git repository for the project.
I also use the setup step to load up a ggplot2 map theme I keep in a gist.
library(sp) library(rgdal) library(dplyr) library(rvest) library(stringi) library(scales) library(RColorBrewer) library(ggplot2) # for theme_map devtools::source_gist("https://gist.github.com/hrbrmstr/33baa3a79c5cfef0f6df")
Step 1 – Read in the map
This is literally a one-liner:
nh <- readOGR("data/nhtowns/NHTOWNS_POLY.shp", "NHTOWNS_POLY")
My projects all have a
data directory and thats where I normally store shapefiles. I used
ogrinfo -al NHTOWNS_POLY.shp at the command line to determine the layer name.
Step 2 – Read in the outage data
rvest package is nothing short of amazing. It makes very quick work of web scraping and—despite some newlines in the mix—this qualifies as a one-liner in my book:
outage <- html("http://www.psnh.com/outagelist/") %>% html_nodes("table") %>% html_table() %>% .[]
That bit of code grabs the whole page, extracts all the HTML tables (there is just one in this example), turns it into a list of data frames and then returns the first one.
Step 3 – Data wrangling
While this step is definitely not as succinct as the two previous ones, it’s pretty straightforward:
outage <- outage[complete.cases(outage),] colnames(outage) <- c("id", "total_customers", "without_power", "percentage_out") outage$id <- stri_trans_totitle(outage$id) outage$out <- cut(outage$without_power, breaks=c(0, 25, 100, 500, 1000, 5000, 10000, 20000, 40000), labels=c("1 - 25", "26 - 100", "101 - 500", "501 - 1,000", "1,001 - 5,000", "5,001- 10,000", "10,001 - 20,000", "20,001 - 40,000"))
We filter out the
NA‘s (this expunges the “total” row), rename the columns, convert the town name to the same case used in the shapefile (NOTE: I could have just
touppered all the town names, but I really like this function from the
stringi package) and then use
cut to make an 8-level factor out of the customer outage count (to match the PSNH map legend).
Step 4 – Preparing the map for plotting with
This is another one-liner:
nh_map <- fortify(nh, region="NAME")
and makes it possible to use the town names when specifying the polygon regions we want to fill with our spiffy color scheme.
Step 5 – Plotting the map
It is totally possible to do this in one line, but many kittens will lose their lives if you do. I like this way of structuring the creation of a
ggplot graphic since it makes it very easy to comment out or add various layers or customizations without worrying about stray
gg <- ggplot(data=nh_map, aes(map_id=id)) gg <- gg + geom_map(map=nh_map, aes(x=long, y=lat), color="#0e0e0e", fill="white", size=0.2) gg <- gg + geom_map(data=outage, map=nh_map, aes(fill=out), color="#0e0e0e", size=0.2) gg <- gg + scale_fill_brewer(type="seq", palette="RdPu", name="Number ofncustomer outagesnin each town") gg <- gg + coord_equal() gg <- gg + labs(title=sprintf("%s Total PSNH Customers Without Power", comma(sum(outage$without_power)))) gg <- gg + theme_map() gg <- gg + theme(legend.position="right") gg
That sequence starts the base
ggplot object creation, sets up the base map colors and then overlays the town outage colors. We use the
RdPu Color Brewer sequential palette and give the legend the same title as the PSNH counterpart.
The shapefile is already projected (Lambert Conformal Conic—take a look at it with
ogrinfo -al), so we can get away with using
coord_equal vs re-projecting it, and we do a tally of outages to stick in the title. My base
theme_map is designed for Maine, hence the extra
theme call to move the legend.
The Finished Product
Crisp SVG polygons, no cluttered Bing Maps tiles and a proper color palette.
All the code is up on github.