Power Outage Impact Choropleths In 5 Steps in R (featuring rvest & RStudio “Projects”)

[This article was first published on rud.is » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I and @awpiii were trading news about the power outages in Maine & New Hampshire last night and he tweeted the link to the @PSNH Outage Map. As if the Bing Maps tiles weren’t bad enough, the use of a categorical color scale instead of a sequential one[1] caused sufficient angst that I whipped up an alternate version in R between making pies and bread for Thanksgiving (even with power being out for us).

PSNH provides a text version of outages (by town) that ends up being a pretty clean HTML table, and a quick Google search led me to a fairly efficient town-level shapefile for New Hampshire. With these data files at the ready, it was time to make a better map.

Step 0 – Environment Setup

So, I lied. There are six steps. “5” just works way better in attention-grabbing list headlines. The first one is setting up the project and loading all the libraries we’ll need. I use RStudio for most of my R coding and their IDE has the concept of a “project” which has it’s own working directory, workspace, history, and source documents separate from any other RStudio windows you have open. They are a great way to organize your analyses and experiments. I have my own “new project” script that sets up additional directory structures, configures the Rproj file with my preferences and initializes a git repository for the project.

I also use the setup step to load up a ggplot2 map theme I keep in a gist.

# for theme_map

Step 1 – Read in the map

This is literally a one-liner:

nh <- readOGR("data/nhtowns/NHTOWNS_POLY.shp", "NHTOWNS_POLY")

My projects all have a data directory and thats where I normally store shapefiles. I used ogrinfo -al NHTOWNS_POLY.shp at the command line to determine the layer name.

Step 2 – Read in the outage data

The rvest package is nothing short of amazing. It makes very quick work of web scraping and—despite some newlines in the mix—this qualifies as a one-liner in my book:

outage <- html("http://www.psnh.com/outagelist/") %>%
  html_nodes("table") %>%
  html_table() %>%

That bit of code grabs the whole page, extracts all the HTML tables (there is just one in this example), turns it into a list of data frames and then returns the first one.

Step 3 – Data wrangling

While this step is definitely not as succinct as the two previous ones, it’s pretty straightforward:

outage <- outage[complete.cases(outage),]
colnames(outage) <- c("id", "total_customers", "without_power", "percentage_out")
outage$id <- stri_trans_totitle(outage$id)
outage$out <- cut(outage$without_power,
    breaks=c(0, 25, 100, 500, 1000, 5000, 10000, 20000, 40000),
    labels=c("1 - 25", "26 - 100", "101 - 500", "501 - 1,000",
             "1,001 - 5,000", "5,001- 10,000", "10,001 - 20,000",
             "20,001 - 40,000"))

We filter out the NA‘s (this expunges the “total” row), rename the columns, convert the town name to the same case used in the shapefile (NOTE: I could have just touppered all the town names, but I really like this function from the stringi package) and then use cut to make an 8-level factor out of the customer outage count (to match the PSNH map legend).

Step 4 – Preparing the map for plotting with ggplot

This is another one-liner:

nh_map <- fortify(nh, region="NAME")

and makes it possible to use the town names when specifying the polygon regions we want to fill with our spiffy color scheme.

Step 5 – Plotting the map

It is totally possible to do this in one line, but many kittens will lose their lives if you do. I like this way of structuring the creation of a ggplot graphic since it makes it very easy to comment out or add various layers or customizations without worrying about stray + signs.

gg <- ggplot(data=nh_map, aes(map_id=id))
gg <- gg + geom_map(map=nh_map, aes(x=long, y=lat),
                    color="#0e0e0e", fill="white", size=0.2)
gg <- gg + geom_map(data=outage, map=nh_map, aes(fill=out),
                    color="#0e0e0e", size=0.2)
gg <- gg + scale_fill_brewer(type="seq", palette="RdPu",
                             name="Number ofncustomer outagesnin each town")
gg <- gg + coord_equal()
gg <- gg + labs(title=sprintf("%s Total PSNH Customers Without Power",
gg <- gg + theme_map()
gg <- gg + theme(legend.position="right")

That sequence starts the base ggplot object creation, sets up the base map colors and then overlays the town outage colors. We use the RdPu Color Brewer sequential palette and give the legend the same title as the PSNH counterpart.

The shapefile is already projected (Lambert Conformal Conic—take a look at it with ogrinfo -al), so we can get away with using coord_equal vs re-projecting it, and we do a tally of outages to stick in the title. My base theme_map is designed for Maine, hence the extra theme call to move the legend.

The Finished Product

Crisp SVG polygons, no cluttered Bing Maps tiles and a proper color palette.


All the code is up on github.

To leave a comment for the author, please follow the link and comment on their blog: rud.is » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)