Crime Analysis – Denver-Part 1

November 16, 2016

(This article was first published on R-Projects - Stoltzmaniac, and kindly contributed to R-bloggers)

Project Background

As we all know, Colorado is considered one of the scariest places on earth. Denver, CO has had an enormous influx of people over the last decade and it is still ramping up.

So why did I pick Denver?

That’s simple, I have lived in Colorado for the majority of my life and want to know more about my capital city.

Exploration of Data
Data provided by

What we’ll do in this post

  • Import the crime.csv data set
  • Format the data
  • Plot the total number of incidents reported by year

Let’s dive in!

Import the necessary libraries

options("stringsAsFactors" = TRUE)  

Load the crime data set provided
It is possible to load the data straight from the URL, however, it’s over 80MB in size, so I simply downloaded it. It is updated regularly, so this post may need to be refreshed from time to time.

# Data from:
# File name: crime.csv
CWD = getwd()  
data = read.csv(paste(CWD,'/data/crime.csv',sep=''))  

Format the data
I added columns for year, month, day and hour into the dataframe in order to simplify life. It takes a bit more time and ram upfront but I prefer to see it that way.

#Format FIRST_OCCURRENCE_DATE as.Date and use as crime date (for now)
data$date = as.Date(data$FIRST_OCCURRENCE_DATE)

#Create new columns for grouping
data$year = year(data$date)  
data$month = month(data$date)  
data$day = day(data$date)  
data$hour = hour(data$FIRST_OCCURRENCE_DATE)


Basic Plotting
ggplot2 will provide a decent chart to show us the number of incidents each year.

#Sum up all incidents IS_CRIME AND IS_TRAFFIC
maxYear = max(data$year)  
maxMonthYTD = max(data$month[data$year==maxYear])

df = data %>%  
  group_by(year,month) %>%
  filter(month < maxMonthYTD) %>%
  summarise(incidents = sum(IS_CRIME) + sum(IS_TRAFFIC)) %>%

p = ggplot(df)  
p + geom_bar(aes(x = factor(year), weight = incidents)) + ggtitle('Incidents Reported by Year') + xlab('Year') + ylab('Incidents') + theme(plot.title = element_text(hjust = 0.5))


Adding Some Color
Looking at the same plot but adding in colors for each month of the year.

#Stack bars in colors to view individual months
p = ggplot(df,aes(x=factor(year),y=incidents,fill=factor(month)))  
p + geom_bar(stat='identity') + ggtitle('Incidents Reported by Year') + xlab('Year') + ylab('Incidents') + theme(plot.title = element_text(hjust = 0.5)) + guides(fill = guide_legend(title='Month'))  


How much is labeled as TRAFFIC or CRIME
Looking at the same plot but separating out ISTRAFFIC from ISCRIME

Stack bars in colors to view individual months

tmp= data  
tmp$crimeType[tmp$IS_CRIME == 1] = 'Crime'  
tmp$crimeType[tmp$IS_CRIME == 0] = 'Traffic'  
tmp$crimeType = factor(tmp$crimeType) 

df = tmp %>%  
  group_by(year,crimeType) %>%
  filter(month < maxMonthYTD) %>%
  summarise(crimeIncidents = sum(IS_CRIME) + sum(IS_TRAFFIC)) %>%

p = ggplot(df,aes(x=factor(year),y=crimeIncidents,fill=crimeType))  
p + geom_bar(stat='identity') + ggtitle('Incidents Reported by Year') + xlab('Year') + ylab('Incidents') + theme(plot.title = element_text(hjust = 0.5)) + guides(fill = guide_legend(title='Incident Type'))  


Initial Impressions
Having isolated only months that have occurred in each year, we’ve seen volume increase most years. The most rapid growth seemed to occur between 2012 – 2014. It appears as if traffic violations seem to be roughly flat and the growth in crimes is much higher. I’ll have to dig into those years in order to see if there’s evidence of a change in crime rate or if something else is hiding in the data.

What I’ll do in the next crime posts

  • Determine year-over-year differences in crime
  • Dig into the apparent crime rate growth from 2012 – 2014
  • Look for patterns by location
  • Answer the question: What types of crimes have grown the most in the last 5 years?

Code used in this post is on my GitHub

To leave a comment for the author, please follow the link and comment on their blog: R-Projects - Stoltzmaniac. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)