Visualizing geospatial data in R—Part 3: Making interactive maps with leaflet
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
This is part 3 of a 4-part series on how to build maps using R.
How to load geospatial data into your workspace and prepare it for visualization
How to make interactive maps (pan, zoom, click) using leaflet
How to add interactive maps to a Shiny dashboard
In this post, we will learn how to make interactive maps using the
leaflet is not a good choice for pasting images in papers and presentations, or for setting a snazzy new desktop background. Go back to Maps, Part 2 for that.
leaflet package is clean and highly reminiscent of
ggplot2. These three reasons–the power of the underlying JS library, the comprehensive R documentation, and the familiar R framework–make
leaflet an obvious choice for the R data analyst.
One important note: This post does not embed actual Leaflet maps. Getting R and Squarespace to work together is not easy (might actually be impossible?). Instead, you will see video demos of the maps in action. Head over to the Leaflet website if you would like to experiment with Leaflet maps yourself. Or, download the R code used in this post and run it yourself!
Review: Load data
Below is code to load two datasets for visual analysis in the rest of the post. The first dataset is a .geojson file containing geospatial descriptions of Philadelphia’s neighborhoods, courtesy of OpenDataPhilly. This dataset is polygon data and will form our basemap for layering on additional, more interesting, features. (We used this data in our last post, too.)
The second dataset is geospatial point data, also provided by OpenDataPhilly, that contains information from the police department on shooting victims. This is a sobering dataset that allows city residents to see location information for shootings, basic demographic information about shooting victims, and trend the city’s gun violence over time.
This dataset is accessed through an call to an API. For those unfamiliar with this type of API, here is a brief introduction.
To use the API, we need to ping to a particular URL. The API in response, will send us a file to download. This is similar to you clicking a website link that opens a tab which downloads a file
The API knows what information to put in the file based on the URL we decided to ping
The URL for all API calls to the Philly cartographic API start with the same base form (https://phl.carto.com/api/v2/sql) and then will append a brief bit of sql. You do not need vast sql knowledge here. Our query for this project is as simple as “select * from shootings where year > 2018.” The table of data we wish to query is called “shootings,” we want to filter based on a column in that table called “year,” and we want all data (the asterisk is shorthand for “all columns”)
We also need to append information to our base URL that tells the API we would like a .geojson file
The end result is the following URL: https://phl.carto.com/api/v2/sql?q=%0A%20%20select%20%2A%0A%20%20from%20shootings%0A%20%20where%20year%20%3E%202018%0A&format=GeoJSON. Try clicking it here, and you will see your browser download a file for you
In R, we use `httr::modify_url()` to create our URL and `sf::read_sf()` to download the .geojson file and load it into R as a simple features object
# SETUP #### #### #### #### #### #### library(dplyr) library(httr) library(sf) # LOAD DATA #### #### #### #### #### #### ## ## Simple features data: Philadelphia neighborhoods # Source: OpenDataPhilly. https://www.opendataphilly.org/dataset/philadelphia-neighborhoods neighborhoods_geojson <- "https://raw.githubusercontent.com/azavea/geo-data/master/Neighborhoods_Philadelphia/Neighborhoods_Philadelphia.geojson" neighborhoods_raw <- sf::read_sf(neighborhoods_geojson) head(neighborhoods_raw) #> Simple feature collection with 6 features and 8 fields #> geometry type: MULTIPOLYGON #> dimension: XY #> bbox: xmin: -75.28027 ymin: 39.96271 xmax: -75.01684 ymax: 40.09464 #> geographic CRS: WGS 84 #> # A tibble: 6 x 9 #> name listname mapname shape_leng shape_area cartodb_id created_at #> <chr> <chr> <chr> <dbl> <dbl> <int> <dttm> #> 1 PENN~ Pennypa~ Pennyp~ 87084. 60140756. 9 2013-03-19 13:41:50 #> 2 OVER~ Overbro~ Overbr~ 57005. 76924995. 138 2013-03-19 13:41:50 #> 3 GERM~ Germant~ Southw~ 14881. 14418666. 59 2013-03-19 13:41:50 #> 4 EAST~ East Pa~ East P~ 10886. 4231000. 129 2013-03-19 13:41:50 #> 5 GERM~ Germany~ German~ 13042. 6949968. 49 2013-03-19 13:41:50 #> 6 MOUN~ Mount A~ East M~ 28846. 43152470. 6 2013-03-19 13:41:50 #> # ... with 2 more variables: updated_at <dttm>, geometry <MULTIPOLYGON [°]> ## ## Simple features data: Philadelphia shootings # Source: OpenDataPhilly. https://www.opendataphilly.org/dataset/shooting-victims base_url <- "https://phl.carto.com/api/v2/sql" q <- " select * from shootings where year > 2018 " shootings_geoJSON <- httr::modify_url( url = base_url, query = list(q = q, format = "GeoJSON") ) shootings_raw <- sf::read_sf(shootings_geoJSON) head(shootings_raw) #> Simple feature collection with 6 features and 22 fields #> geometry type: POINT #> dimension: XY #> bbox: xmin: -75.19185 ymin: 39.90638 xmax: -75.05941 ymax: 40.0475 #> geographic CRS: WGS 84 #> # A tibble: 6 x 23 #> cartodb_id objectid year dc_key code date_ time race sex #> <int> <int> <int> <chr> <chr> <dttm> <chr> <chr> <chr> #> 1 3164 5114 2019 20190~ 0411 2019-02-19 19:00:00 17:4~ B M #> 2 3168 5118 2019 20190~ 0111 2019-03-29 20:00:00 21:5~ W M #> 3 3169 5119 2019 20190~ 0111 2019-04-09 20:00:00 11:5~ B M #> 4 3183 5133 2019 20190~ 0111 2019-09-27 20:00:00 16:5~ B M #> 5 3185 5135 2019 20190~ 0411 2019-05-31 20:00:00 12:1~ B M #> 6 3188 5138 2019 20190~ 0411 2019-02-10 19:00:00 20:3~ W M #> # ... with 14 more variables: age <chr>, wound <chr>, officer_involved <chr>, #> # offender_injured <chr>, offender_deceased <chr>, location <chr>, #> # latino <int>, point_x <dbl>, point_y <dbl>, dist <chr>, inside <int>, #> # outside <int>, fatal <int>, geometry <POINT [°]>
Review: Clean data
Next, we will do some basic data cleaning. For the
neighborhoods dataset, we will drop and rename columns. In the
shootings dataset, we will remove points that have latitude and longitude in Florida. More cleanup to this dataset will come later once we have started making our maps.
Note from above that both of the datasets are already in the WGS 84 CRS.
leaflet requires that data be in WGS 84, so we would need to convert to WGS 84 (EPSG code: 4326) using
sf::st_transform(shootings, crs = 4326) if it weren’t provided to us with that CRS.
If we want to run non-geospatial analysis on our shootings data, such as plotting shootings over time, calculating totals by demographic, and so on, we can drop the geospatial information and work with a standard tibble using
# CLEAN DATA #### #### #### #### #### #### neighborhoods <- neighborhoods_raw %>% dplyr::select(label = mapname) shootings <- shootings_raw %>% dplyr::filter(point_x > -80 & point_y > 25) # points in FL
Geospatial layers in leaflet
The concepts of loading and mapping various layers of data in
leaflet are similar to what we had seen in
ggplot2. You have the option of loading data either as the
data = ... argument in
leaflet::leaflet() or waiting until subsequent layers to provide the data. As in our last post, we will add the data in each layer, since we are working with two distinct datasets.
The examples below will walk you through making maps in
leaflet, starting with the most basic map and building the complexity from there. Below each piece of code you will find a static image of the map. To interact with the map (as it was intended!), run the code chunks or download the R code in its entirety.
Your first map
# ANALYZE #### #### #### #### #### #### library(leaflet) leaflet::leaflet() %>% leaflet::addPolygons(data = neighborhoods) %>% leaflet::addCircles(data = shootings)
This should look very similar to what we would have written for
ggplot2! The only difference in code is that we have to specify if we want to add Polygons or Circles, and we replace the
library(ggplot2) ggplot2::ggplot() + ggplot2::geom_sf(data = neighborhoods) + ggplot2::geom_sf(data = shootings)
Add a basemap
leaflet package makes it easy to add map tiles, or “basemaps” to the layperson. You can either choose to call
addTiles() with no arguments to get the default basemap from OpenStreetMap or choose to call
addProviderTiles() to get one of the various third-party options. Our favorite is CartoDB.Voyager, but you can explore the entire set of options and pick your favorite.
leaflet::leaflet() %>% leaflet::addProviderTiles(providers$CartoDB.Voyager) %>% leaflet::addPolygons(data = neighborhoods) %>% leaflet::addCircles(data = shootings)
Simple formatting adjustments
The basemap helps with the visual aesthetic, but we still have a long way to go. Let’s make some basic formatting adjustments to the polygons layer: line color, line weight, line opacity, and fill opacity (0 = no fill). We’ll also add a label, which will appear upon hover. Notice that we need to add
htmltools::htmlEscape(). Leaflet recommends escaping HTML text for security reasons in situations where labels and popups might contain unwanted HTML content. We zoomed in on the map before taking a screenshot so that you can see the points in more detail.
library(htmltools) leaflet::leaflet() %>% leaflet::addProviderTiles(providers$CartoDB.Voyager) %>% leaflet::addPolygons( color = "#222", weight = 2, opacity = 1, fillOpacity = 0, label = ~lapply(label, htmltools::htmlEscape), labelOptions = leaflet::labelOptions(direction = "top"), data = neighborhoods ) %>% leaflet::addCircles(data = shootings)
Jitter points so that we can see them more clearly
shootings datast is only as precise as the block on which the event happened. Multiple shootings on the same block result in overlapping points. You may have seen that some of the points in the plot above were darker than others. This is because we had overlapped multiple translucent circles. To avoid this issue, we will “jitter” our points, adding a small amount of random displacement in the x- and y-directions. To make this jitter consistent each time you render the plot, remember to set the seed value for the random jitter using
set.seed(1776) shootings <- sf::st_jitter(shootings, factor = 0.0004) leaflet::leaflet() %>% leaflet::addProviderTiles(providers$CartoDB.Voyager) %>% leaflet::addPolygons( color = "#222", weight = 2, opacity = 1, fillOpacity = 0, label = ~lapply(label, htmltools::htmlEscape), labelOptions = leaflet::labelOptions(direction = "top"), data = neighborhoods ) %>% leaflet::addCircles(data = shootings)
Add labels for clearer communication
Our final set of aesthetic changes will be to our point layer. We add two new variables to our
shootings dataset: a “color” variable that encodes encodes the “fatal” variable into red and grey, and a “popup” variable that summarizes key information about each shooting. This popup variable will appear in our map when we click on a point. In
leaflet, labels appear upon hover, and popups appear upon click. Our popup variable contains html. Here is a quick translation for those unfamiliar with html:
</b> mean to start and end a section of bold text,
</i> mean to start and end a section of italics, and
<br/> means to add a line break. With some creative combinations of these html tools, we can create a simple and effective popup box.
Adding the color and popup variables to our plot is as simple as specifing
color = ~color and
popup = ~popup inside our
addCircles() function call.
shootings <- shootings %>% dplyr::mutate( color = dplyr::if_else(fatal == 1, "#900", "#222"), popup = paste0( "<b>", location, "</b>", "<br/><i>", date_, "</i>", "<br/><b>Race:</b> ", dplyr::case_when( race == "B" ~ "Black", race == "W" ~ "White", TRUE ~ "NA" ), "<br/><b>Sex:</b> ", dplyr::case_when( sex == "M" ~ "Male", sex == "F" ~ "Female", TRUE ~ "NA" ), "<br/><b>Age:</b> ", age, "<br/><b>Wound:</b> ", wound, "<br/><b>Fatal?:</b> ", dplyr::case_when( fatal == 1 ~ "Yes", fatal == 0 ~ "No", TRUE ~ "NA" ) ) ) %>% dplyr::select(color, popup) leaflet::leaflet() %>% leaflet::addProviderTiles(providers$CartoDB.Voyager) %>% leaflet::addPolygons( color = "#222", weight = 2, opacity = 1, fillOpacity = 0, label = ~lapply(label, htmltools::htmlEscape), labelOptions = leaflet::labelOptions(direction = "top"), data = neighborhoods ) %>% leaflet::addCircles( color = ~color, popup = ~popup, data = shootings )
Choropleths in leaflet
Choropleths–maps in which each region is colored according to a summary statistic–are a powerful way to visualize data. In this example, let us suppost that we would like to show the total number of shootings in each neighborhood. This is a six-step process:
- Join polygon data (
neighborhoods) with point data (
- Group by the ID of the polygon data (
neighborhoods$label) and summarize our metric of interest (
n(), but other situations could use count per square mile, maximum, minimum, average, etc.)
- (Optional) Filter based on the summary statistics
- (Optional) Create labels / popups
- Save as a new simple features object
- Create a palette function to instruct
leafleton how to color the regions
The final step–the creation of a palette function–is somewhat unique to
leaflet. Pick a color palette from a
viridis, or build your own. Then use one of
leaflet‘s options such as
colorQuantile() to appropriately assignin the color palette to your range of data. In the example below, we picked a palette ranging from Yellow to Red and assigned it to the “total_shootings” variable.
shootings_count <- sf::st_join(neighborhoods, shootings) %>% dplyr::group_by(label) %>% dplyr::summarise(total_shootings = n(), .groups = "drop") %>% dplyr::mutate( label = paste0("<b>", label, ":</b> ", total_shootings) ) %>% dplyr::select(label, total_shootings) pal <- leaflet::colorNumeric( "YlOrRd", domain = shootings_count$total_shootings ) leaflet::leaflet(shootings_count) %>% leaflet::addPolygons( color = "#222", weight = 2, opacity = 1, fillColor = ~pal(total_shootings), fillOpacity = 0.7 )
In this final map, we add back our provider tiles, our label, and our highlight options, with no changes here from what had been done earlier in this post. We have also added a legend (and assigned the palette function to it), which describes the color range. Notice that we can change the opacity and location of the legend so that it is as unobtrusive as possible.
leaflet::leaflet(shootings_count) %>% leaflet::addProviderTiles(providers$CartoDB.Voyager) %>% leaflet::addPolygons( color = "#222", weight = 2, opacity = 1, fillColor = ~pal(total_shootings), fillOpacity = 0.7, label = ~lapply(label, htmltools::HTML), labelOptions = leaflet::labelOptions(direction = "top"), highlight = leaflet::highlightOptions( color = "#FFF", bringToFront = TRUE ) ) %>% leaflet::addLegend( pal = pal, values = ~total_shootings, opacity = 0.7, title = "# shootings", position = "topleft" )
Simple, right? The
leaflet approach to plotting graphs can be somewhat tricky at first, as there are a lot of parameters that you can choose to adjust. In the end though, it’s relatively similar to
ggplot2 in that we create a base object, add layers to it, and adjust each layer’s parameters as we add it. The options available to us are a little bit different. We would struggle to recreate and exact copy of
ggplot2‘s maps in
leaflet. But, that is to be expected. These two packages create two different types of maps–static and interactive–for different analytical purposes. What
leaflet might lose in creating annotations and allowing for extremely precise aesthetic changes, it gains by allowing for paning, zooming, hovers, and popups.
Think of these two packages as complimentary tools in your analytics arsenal. Think carefully about when to use each one so that you can display data clearly, insightfully, and intuitively.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.