How to map geospatial data: USA rivers

[This article was first published on r-bloggers – SHARP SIGHT LABS, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.



R Code

Here’s the R code to produce the map:

#===============
# LOAD PACKAGES
#===============
library(tidyverse)
library(maptools)


#===============
# GET RIVER DATA
#===============

#==========
# LOAD DATA
#==========

#DEFINE URL
# - this is the location of the file
url.river_data <- url("http://sharpsightlabs.com/wp-content/datasets/usa_rivers.RData")


# LOAD DATA
# - this will retrieve the data from the URL
load(url.river_data)


# INSPECT
summary(lines.rivers)
lines.rivers@data %>% glimpse()


levels(lines.rivers$FEATURE)
table(lines.rivers$FEATURE)

#==============================================
# REMOVE MISC FEATURES
# - there are some features in the data that we
#   want to remove
#==============================================
lines.rivers <- subset(lines.rivers, !(FEATURE %in% c("Shoreline"
                                                      ,"Shoreline Intermittent"
                                                      ,"Null"
                                                      ,"Closure Line"
                                                      ,"Apparent Limit"
                                                      )))

# RE-INSPECT
table(lines.rivers$FEATURE)

#==============
# REMOVE STATES
#==============

#-------------------------------
# IDENTIFY STATES
# - we need to find out
#   which states are in the data
#-------------------------------
table(lines.rivers$STATE)


#---------------------------------------------------------
# REMOVE STATES
# - remove Alaska, Hawaii, Puerto Rico, and Virgin Islands
# - these are hard to plot in a confined window, so 
#   we'll remove them for convenience
#---------------------------------------------------------

lines.rivers <- subset(lines.rivers, !(STATE %in% c('AK','HI','PR','VI')))

# RE-INSPECT
table(lines.rivers$STATE)


#============================================
# FORTIFY
# - fortify will convert the 
#   'SpatialLinesDataFrame' to a proper
#    data frame that we can use with ggplot2
#============================================

df.usa_rivers <- fortify(lines.rivers)


#============
# GET USA MAP
#============
map.usa_country <- map_data("usa")
map.usa_states <- map_data("state")


#=======
# PLOT
#=======

ggplot() +
  geom_polygon(data = map.usa_country, aes(x = long, y = lat, group = group), fill = "#484848") +
  geom_path(data = df.usa_rivers, aes(x = long, y = lat, group = group), color = "#8ca7c0", size = .08) +
  coord_map(projection = "albers", lat0 = 30, lat1 = 40, xlim = c(-121,-73), ylim = c(25,51)) +
  labs(title = "Rivers and waterways of the United States") +
  annotate("text", label = "sharpsightlabs.com", family = "Gill Sans", color = "#A1A1A1"
           , x = -89, y = 26.5, size = 5) +
  theme(panel.background = element_rect(fill = "#292929")
        ,plot.background = element_rect(fill = "#292929")
        ,panel.grid = element_blank()
        ,axis.title = element_blank()
        ,axis.text = element_blank()
        ,axis.ticks = element_blank()
        ,text = element_text(family = "Gill Sans", color = "#A1A1A1")
        ,plot.title = element_text(size = 34)
        ) 

Use this as practice

If you’ve learned the basics of data visualization in R (namely, ggplot2) and you’re interested in geospatial visualization, use this as a small, narrowly-defined exercize to practice some intermediate skills.

There are at least three things that you can learn and practice with this visualization:

  1. Learn about color: Part of what makes this visualization compelling are the colors. Notice that in the area surrounding the US, we’re not using pure black, but a dark grey. For the title, we’re not using white, but a medium grey. Also, notice that for the rivers, we’re not using “blue” but a very specific hexadecimal color. These are all deliberate choices. As an exercise, I highly recommend modifying the colors. Play around a bit and see how changing the colors changes the “feel” of the visualization.
  2. Learn to build visualizations in layers: I’ve emphasized this several times recently, but layering is an important principle of data visualization. Notice that we’re layering the river data over the USA country map. As an exercise, you could also layer in the state boundaries between the country map and the rivers. To do this, you can use map_data().
  3. Learn about ‘Spatial’ data: R has several classes for dealing with ‘geospatial’ data, such as ‘SpatialLines‘, ‘SpatialPoints‘, and others. Spatial data is a whole different animal, so you’ll have to learn its structure. This example will give you a little experience dealing with it.

Iterate to get the details right

What really makes this visualization work is the fine little details. In particular, the size of the lines and the colors.

The reality is that creating good-looking visualizations requires attention to the little details.

To get the details right for a plot like this, I recommend that you build the visualization iteratively.

Start with a simple version of just the map of the US.

ggplot() +
  geom_polygon(data = map.usa_country, aes(x = long, y = lat, group = group), fill = "#484848")

Next, layer on the rivers:

ggplot() +
  geom_polygon(data = map.usa_country, aes(x = long, y = lat, group = group), fill = "#484848") +
  geom_path(data = df.usa_rivers, aes(x = long, y = lat, group = group)) 

Make no mistake: this doesn’t look good. But, in the early stages, that’s not the goal. You just want to make sure that the data are structurally right. You want something simple that you can build on.

Ok, next, play with the river colors.

Start with a simple ‘blue‘:

ggplot() +
  geom_polygon(data = map.usa_country, aes(x = long, y = lat, group = group), fill = "#484848") +
  geom_path(data = df.usa_rivers, aes(x = long, y = lat, group = group), color = "blue") 

Let’s be honest. This still does not look good.

But it’s closer.

From here, you can play with the colors some more. Select a new color (I recommend using a color picker), and modify the color = aesthetic for geom_path().

ggplot() +
  geom_polygon(data = map.usa_country, aes(x = long, y = lat, group = group), fill = "#484848") +
  geom_path(data = df.usa_rivers, aes(x = long, y = lat, group = group), color = "#99ccff") 

Not perfect, but better still.

From here, you can continue to iterate, add more details, and get them all “perfect”:

  • The exact color (this takes lots of trial-and-error, and a bit of good taste)
  • The line size for geom_path()
  • The title and text annotations
  • Modify the projection, and change it to the “albers” projection with coord_map()
  • The other theme() details like background color, removing extraneous elements (like the axis labels) etc

Once again: getting this just right takes lots of iteration. Try it yourself and build this visualization from the bottom up.

Learn ggplot2 (because ggplot2 makes this easy)

In this post, we’ve used ggplot2 to create this particular visualization. While I would classify this visualization at an “intermediate” level, ggplot2 still makes it relatively easy.

That said, if you’re interested in data science and data visualization, learn ggplot2.

Longtime readers at Sharp Sight will know my thoughts on this, but if you’re a new reader this is important.

ggplot2 is almost without question, the best data visualization tool available. Of course, different people will have different needs, but speaking generally, ggplot2 is flexible, powerful, and it allows you to create beautiful data visualizations with relative ease.

Not interested in visualization per se?

Do you want to focus on machine learning instead?

Fair enough.

If you want to learn machine learning, you still need to be able to analyze and explore your data.

Once again, the best tool for exploring and analyzing your data is ggplot2. This is particularly true when you combine it with dplyr, tidyr, stringr, and other tools from the tidyverse.

Sign up to master data visualization

Do you want to get a job as a data scientist?

You need to master data visualization.

We’ll show you how.

Sign up now, and we’ll show you step-by-step how to learn (and master) data visualization in R.

SIGN UP NOW

The post How to map geospatial data: USA rivers appeared first on SHARP SIGHT LABS.

To leave a comment for the author, please follow the link and comment on their blog: r-bloggers – SHARP SIGHT LABS.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)