NYC Motor Vehicle Collisions – Street-Level Heat Map

March 10, 2015
By

(This article was first published on Stable Markets » R, and kindly contributed to R-bloggers)

StreetLevelMap In this post I will extend a previous analysis creating a borough-level heat map of NYC motor vehicle collisions. The data is from NYC Open Data. In particular, I will go from borough-level to street-level collisions. The processing of the code is very similar to the previous analysis, with a few more functions that map streets to colors. Below, I load the ggmap package, and the data, and only keep collisions with longitude and latitude information.

library(ggmap)

d=read.csv('.../NYPD_Motor_Vehicle_Collisions.csv')
d_clean=d[which(regexpr(',',d$LOCATION)!=-1),]

#### 1. Clean Data ####
# get long and lat coordinates from concatenated "location" var
comm=regexpr(',',d_clean$LOCATION)
d_clean$loc=as.character(d_clean$LOCATION)
d_clean$lat=as.numeric(substr(d_clean$loc,2,comm-1))
d_clean$long=as.numeric(substr(d_clean$loc,comm+1,nchar(d_clean$loc)-1))

# create year variable
d_clean$year=substr(d_clean$DATE,7,10)

I use the three functions below to process my data. The boro() function subsets to collisions with street names in a specified borough, since some collisions with coordinate data do not have street name data. The function then subsets to collisions in 2013. The accident_freq() functions calculates the frequency of collisions per street, then merges these numbers back to the collision-level data. This is important since the map needs collision-level data, for reasons that will be clear soon. The assign_col() function takes a collision-level data set (created with the accident_freq() function) for a particular borough and assigns each street a color ranging from white to a specified color (e.g. green, red, etc.). Streets with more collisions will be darker.

# functions boro() subsets to 2013 accidents in specified borough
boro=function(x){
 d_clean2=d_clean[which(d_clean$ON.STREET.NAME!='' & d_clean$BOROUGH==x),]
 d_2013_2=d_clean2[which(d_clean2$year=='2013'),c('long','lat','ON.STREET.NAME')]
return(d_2013_2)
}

# accident_freq() gets frequency of accidents per street for specified borough
accident_freq=function(x){
 tab=data.frame(table(x$ON.STREET.NAME))
 d_merge=merge(x=x,y=tab,by.x=c('ON.STREET.NAME'),by.y=c('Var1'))
 d_merge$freqPerc=round((d_merge$Freq/length(x$ON.STREET.NAME))*1000,digits=0)
 d_merge$freqPerc=ifelse(d_merge$freqPerc==0,1,d_merge$freqPerc)
return(d_merge)
}

# assign_col() assigns color shade to each street based on frequency
assign_col=function(x,c){
 pal=colorRampPalette(c('white',c))
 colors=pal(max(x$freqPerc))
 return(colors)
}

man=boro('MANHATTAN')
bronx=boro('BRONX')
brook=boro('BROOKLYN')
si=boro('STATEN ISLAND')
q=boro('QUEENS')

man_freq=accident_freq(man)
bronx_freq=accident_freq(bronx)
brook_freq=accident_freq(brook)
si_freq=accident_freq(si)
q_freq=accident_freq(q)

man_col=assign_col(man_freq,'dodgerblue')
bronx_col=assign_col(bronx_freq,'darkred')
brook_col=assign_col(brook_freq,'violet')
si_col=assign_col(si_freq,'darkgreen')
q_col=assign_col(q_freq,'darkgoldenrod4')

Finally, I use ggmap’s get_map() function to get a toner style map of NYC and add geom_path layers. There is one geom_path() layer per borough. Geom_path() connects all longitude and latitude points that are on the same street with a line or “path.” Essentially, it uses street as a grouping factor for the coordinates. All coordinates in a group are connected. Each line is then given a color determined by assign_col() using the col= parameter.

ny_plot=ggmap(get_map('New York, New York',zoom=11,maptype='toner'))

plot3=ny_plot+
 geom_path(data=man,size=1,aes(x=man$long, y=man$lat,group=man$ON.STREET.NAME),col=man_col[man_freq$freqPerc])+
 geom_path(data=bronx,size=1,aes(x=bronx$long, y=bronx$lat,group=bronx$ON.STREET.NAME),col=bronx_col[bronx_freq$freqPerc])+
 geom_path(data=brook,size=1,aes(x=brook$long, y=brook$lat,group=brook$ON.STREET.NAME),col=brook_col[brook_freq$freqPerc])+
 geom_path(data=si,size=1,aes(x=si$long, y=si$lat,group=si$ON.STREET.NAME),col=si_col[si_freq$freqPerc])+
 geom_path(data=q,size=1,aes(x=q$long, y=q$lat,group=q$ON.STREET.NAME),col=q_col[q_freq$freqPerc])+
 ggtitle('Street-Level NYC Vehicle Accidents by Borough')+
 xlab(" ")+ylab(" ")
plot3

To leave a comment for the author, please follow the link and comment on their blog: Stable Markets » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)