Mapping Seattle Crime

[This article was first published on SHARP SIGHT LABS » r-bloggers, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

seattle_crime_map_2010-2014_ggplot2_590x670

Last week I published a data visualization of San Francisco crime.

This week, I’m mapping Seattle crime data.

The map above is moderately complicated to create, so I’ll start this tutorial with a simpler case: the dot distribution map.

Seattle crime map, simplified version

First, we’ll start by loading the data.

Note that I already “cleaned” this dataset (mostly removing extraneous variables, data prior to 2010, etc,).

library(ggmap)
library(dplyr)
library(ggplot2)

#########################
# GET SEATTLE CRIME DATA
#########################

download.file("http://www.sharpsightlabs.com/wp-content/uploads/2015/01/seattle_crime_2010_to_2014_REDUCED.txt.zip", destfile="seattle_crime_2010_to_2014_REDUCED.txt.zip")

#------------------------------
# Unzip the SF crime data file
#------------------------------
unzip("seattle_crime_2010_to_2014_REDUCED.txt.zip")

#------------------------------------
# Read crime data into an R dataframe
#------------------------------------
df.seattle_crime <- read.csv("seattle_crime_2010_to_2014_REDUCED.txt")

 

Get map of Seattle using ggmap package

Next, we’ll get a map of Seattle using qmap().

qmap() is a function from the ggmap package. Basically, it pings Google Maps and creates a map that you can use for a geospatial context layer. (It can also retrieve related maps made by Stamen, CloudMade, or OpenStreetMap.)


################
# SEATTLE GGMAP
################

map.seattle_city <- qmap("seattle", zoom = 11, source="stamen", maptype="toner",darken = c(.3,"#BBBBBB"))
map.seattle_city

 
Here, we’re using qmap() as follows:

We’re calling it with “seattle” as the first argument. That does exactly what you think it does. It tells qmap() that we want a map of Seattle. The qmap() function understands city names, so you can ask for “chicago,” “san francisco,” etc. Play with it a little!

We’re also setting a “zoom” parameter. Again, play with that number and see what happens. Currently, we’re setting zoom to 11. To be clear, you can use zoom to zoom in or zoom out on the specified location. In this case, we’re zooming in on the center of Seattle, and if we zoom in too much, we’ll omit parts of the city. For our purposes, a zoom of 11 is ideal.

The maptype= parameter has been set to “toner”. The “toner” maptype is basically a black and white map. (Note that there are other maptypes, such as “satellite,” and “hybrid.” Try those out and see what happens.)

On top of that, you’ll note that I’m using a parameter called “darken.” Effectively, I’m using darken to add color on top of the map (the hexidecimal color “#BBBBBB”). I’ve done this to subtly change the map color from pure black and white to shades of grey.

Next, we’ll plot.

Make basic dot distribution map


##########################
# CREATE BASIC MAP
#  - dot distribution map
##########################
map.seattle_city +
  geom_point(data=df.seattle_crime, aes(x=Longitude, y=Latitude))

seattle_crime_basic-dot-distribution-map_2010-2014_ggplot2_500x409
 
This map is a little ugly, but it’s instructive to examine what we’re doing in the code.

Notice that the syntax is almost the same as the syntax for the basic scatterplot. In some sense, this is a scatterplot.

As proof, let’s create a scatterplot using the same dataset. Simply replace the map.seattle_city code with ggplot().

#####################
# CREATE SCATTERPLOT
#####################
ggplot() +
  geom_point(data=df.seattle_crime, aes(x=Longitude, y=Latitude))

seattle_crime_basic-scatterplot_2010-2014_ggplot2_275x409
 
This is the exact same data and the same variable mapping. We’ve just removed the map.seattle_city context layer. Now, it’s just a basic scatterplot.

That’s part of the reason I wanted to write up this tutorial. I’ve emphasized earlier that you should master the basic charts like the scatterplot. One reason I emphasize the basics is because the basic charts serve as foundations for more complicated charts.

In this case, the scatterplot is the foundation for the dot distribution map.

Ok. Now, let’s go back to our map. You might have noticed that the data is really “dense.” All of the points are on top of each other. We call this “overplotting.” We’re going to modify our point geoms to deal with this overplotting.

Adjust point transparency to deal with overplotting


#############################
# ADD TRANSPARENCY and COLOR
#############################

map.seattle_city +
  geom_point(data=df.seattle_crime, aes(x=Longitude, y=Latitude), color="dark green", alpha=.03, size=1.1)

seattle_crime_basic-dot-distribution-map_GREEN_2010-2014_ggplot2_500x409
 
Notice that we made some modifications within geom_point().

We added color to make it a little more interesting.

But more importantly, we modified two parameters: alpha= and size=.

The size= parameter obviously modifies the size of the point.

alpha modifies the transparency. In this case, we’re making the points highly transparent so we can better see areas of Seattle with high levels of crime. We’re manipulating alpha levels to deal with overplotting.

To be clear, there are other solutions for dealing with overplotting. This isn’t necessarily the best solution, but early in learning data science, this will be one of the simplest to implement.

Wrapping up

The above tutorial shows you how to make a basic dot distribution map using R’s ggplot2 and ggmap.

Note a few things:

  1. We’re building on foundational techniques. In this case, we’ve made a dot distribution map, which is just a modified scatterplot.
  2. We built this plot iteratively. We started with the base map, then added points, and then modified those points.

It bears repeating that you should master the basics like the scatterplot, line, histogram, and bar chart. Also practice designing data visualizations iteratively. When you can do these things, you’ll be able to progress to more sophisticated visualization techniques.

Finally, if you want to replicate the map at the beginning of the post, here’s the code:


#################################
# TILED version 
#  tile border mapped to density
#################################
map.seattle_city +
  stat_density2d(data=df.seattle_crime, aes(x=Longitude
                                            , y=Latitude
                                            ,color=..density..
                                            ,size=ifelse(..density..<=1,0,..density..)
                                            ,alpha=..density..)
                 ,geom="tile",contour=F) +
  scale_color_continuous(low="orange", high="red", guide = "none") +
  scale_size_continuous(range = c(0, 3), guide = "none") +
  scale_alpha(range = c(0,.5), guide="none") +
  ggtitle("Seattle Crime") +
  theme(plot.title = element_text(family="Trebuchet MS", size=36, face="bold", hjust=0, color="#777777")) 

seattle_crime_map_2010-2014_ggplot2_590x670
 
If you look carefully, you’ll notice that the code has quite a few similarities to the basic dot distribution map. (Again: master the basics, and you’ll start to understand what’s going on here.)

The post Mapping Seattle Crime appeared first on SHARP SIGHT LABS.

To leave a comment for the author, please follow the link and comment on their blog: SHARP SIGHT LABS » r-bloggers.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)