Crime Analysis – Denver-Part 1

[This article was first published on R-Projects – Stoltzmaniac, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Project Background

As we all know, Colorado is considered one of the scariest places on earth. Denver, CO has had an enormous influx of people over the last decade and it is still ramping up.

So why did I pick Denver?

That’s simple, I have lived in Colorado for the majority of my life and want to know more about my capital city.

Exploration of Data
Data provided by https://www.denvergov.org/opendata/dataset/city-and-county-of-denver-crime

What we’ll do in this post

  • Import the crime.csv data set
  • Format the data
  • Plot the total number of incidents reported by year

Let’s dive in!

Import the necessary libraries

library(dplyr)  
library(lubridate)  
library(ggplot2)  
options("stringsAsFactors" = TRUE)  

Load the crime data set provided
It is possible to load the data straight from the URL, however, it’s over 80MB in size, so I simply downloaded it. It is updated regularly, so this post may need to be refreshed from time to time.

####
# Data from: http://data.denvergov.org/dataset/city-and-county-of-denver-crime
# File name: crime.csv
CWD = getwd()  
data = read.csv(paste(CWD,'/data/crime.csv',sep=''))  
####

Format the data
I added columns for year, month, day and hour into the dataframe in order to simplify life. It takes a bit more time and ram upfront but I prefer to see it that way.

#Format FIRST_OCCURRENCE_DATE as.Date and use as crime date (for now)
data$date = as.Date(data$FIRST_OCCURRENCE_DATE)

#Create new columns for grouping
data$year = year(data$date)  
data$month = month(data$date)  
data$day = day(data$date)  
data$hour = hour(data$FIRST_OCCURRENCE_DATE)

print(colnames(data))  

Basic Plotting
ggplot2 will provide a decent chart to show us the number of incidents each year.

#Sum up all incidents IS_CRIME AND IS_TRAFFIC
maxYear = max(data$year)  
maxMonthYTD = max(data$month[data$year==maxYear])

df = data %>%  
  group_by(year,month) %>%
  filter(month < maxMonthYTD) %>%
  summarise(incidents = sum(IS_CRIME) + sum(IS_TRAFFIC)) %>%
  arrange(month)

p = ggplot(df)  
p + geom_bar(aes(x = factor(year), weight = incidents)) + ggtitle('Incidents Reported by Year') + xlab('Year') + ylab('Incidents') + theme(plot.title = element_text(hjust = 0.5))

barplotCOLOR

Adding Some Color
Looking at the same plot but adding in colors for each month of the year.

#Stack bars in colors to view individual months
p = ggplot(df,aes(x=factor(year),y=incidents,fill=factor(month)))  
p + geom_bar(stat='identity') + ggtitle('Incidents Reported by Year') + xlab('Year') + ylab('Incidents') + theme(plot.title = element_text(hjust = 0.5)) + guides(fill = guide_legend(title='Month'))  

barplotCOLOR

How much is labeled as TRAFFIC or CRIME
Looking at the same plot but separating out ISTRAFFIC from ISCRIME

Stack bars in colors to view individual months

tmp= data  
tmp$crimeType[tmp$IS_CRIME == 1] = 'Crime'  
tmp$crimeType[tmp$IS_CRIME == 0] = 'Traffic'  
tmp$crimeType = factor(tmp$crimeType) 

df = tmp %>%  
  group_by(year,crimeType) %>%
  filter(month < maxMonthYTD) %>%
  summarise(crimeIncidents = sum(IS_CRIME) + sum(IS_TRAFFIC)) %>%
  arrange(year)

p = ggplot(df,aes(x=factor(year),y=crimeIncidents,fill=crimeType))  
p + geom_bar(stat='identity') + ggtitle('Incidents Reported by Year') + xlab('Year') + ylab('Incidents') + theme(plot.title = element_text(hjust = 0.5)) + guides(fill = guide_legend(title='Incident Type'))  

barplotCOLOR

Initial Impressions
Having isolated only months that have occurred in each year, we’ve seen volume increase most years. The most rapid growth seemed to occur between 2012 – 2014. It appears as if traffic violations seem to be roughly flat and the growth in crimes is much higher. I’ll have to dig into those years in order to see if there’s evidence of a change in crime rate or if something else is hiding in the data.

What I’ll do in the next crime posts

  • Determine year-over-year differences in crime
  • Dig into the apparent crime rate growth from 2012 – 2014
  • Look for patterns by location
  • Answer the question: What types of crimes have grown the most in the last 5 years?

Code used in this post is on my GitHub

To leave a comment for the author, please follow the link and comment on their blog: R-Projects – Stoltzmaniac.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)