Traffic tickets and where to find them.

[This article was first published on R – NYC Data Science Academy Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Dashboard: Link

Introduction & Data

Anyone who has ever experienced driving in New York City can attest to the fact that it may at times be a less-than-satisfactory experience. With pedestrians who rightfully assume ownership of all parts of this city, cyclists who do the same, and other drivers who, let’s face it, are no different. This mess of movement is regulated by a few city agencies who have the power to issue traffic tickets to those of us who violate the rules of the road (luckily for some it does not include pedestrians).

After coming across a dataset of all traffic tickets issued in New York State over a four year period from 2014 through end of 2017, I was intrigued as to what sort of dark secrets may be hiding in the over 14 million traffic citations handed and received over that time period. Boiling down the data from the state level to include only the 5 boroughs I was left with a little over 4 million observations to sift through – challenge accepted. Unfortunately, due to size limitations on the host server I ended up reducing the dataset to 3 years of observations, from 2015 to 2017, however, I retained 2014 for a time series analysis that we’ll touch on later.

R Dashboard & Findings

To visualize the data I decided to utilize Semantic Dashboard which is built on RStudio’s Shiny architecture. The first thing that struck me, as you can see in the following map, is that Manhattan is the borough with the highest total number of tickets issued year over year – us Brooklynites falling to second place by a slim margin.  This is perhaps unsurprising given the notoriety of Manhattan roads.

Darker shaded area corresponds to higher total ticket count.

Interestingly enough, on a per capita basis Manhattan is still number one for traffic citations, however, this time around Staten Island comes second even while being the least populated borough by a wide margin. This also makes sense once we consider that Staten Island is a much more car-oriented borough with few other transportation options.

Ticket count on a per-capita basis.

Taking a deeper dive, the dashboard allows us to sort the data by offense type; borough in which citation was issued; age, sex, and license-issuing state of citation recipient; as well as by citation issuing agency. Interestingly enough, sorting the data by sex shows that men far outweigh women in the number of traffic tickets received citywide over the three year period considered. This is all the more startling given that more than 50% of licensed motorists in the United States are in fact female.

Male v Female Citywide Traffic Ticket Count 2015-2017

It is also interesting to note that most traffic tickets over this three year period are issued on Wednesday, regardless of borough. The reason for this is not immediately apparent but may be attributed in equal parts to mid-week attention span drops and NYPD conspiracy theories. There is also unfortunately no way to tell if the data accounts for duplicate or dismissed tickets – this is an area for future dashboard improvement.

Wednesday is the day for most ticket issuance consistent through the 5 boroughs.

Finally, the question we’ve all been asking ourselves, is which violations are drivers in the city most keen on committing. The data is illuminating on this point. It seems that in Brooklyn and Manhattan, disobeying a traffic device is the most common violation while in Staten Island, Queens, and the Bronx speeding is the most popular way to get oneself into a pickle with the police.

Tickets issued by violation type sorted by borough.

The last part of the dashboard is a time series of total monthly ticket counts over a four year period from 2014 to 2017. We have the option to select a moving average smoother to get a clearer picture of any trends as well as the ability to fit an AR (Auto-Regressive) model for forecasting future ticket issuance. Along with the point forecast we also plot an 80% confidence bound. It looks like the confidence bound is consistent with the overall variability of ticket counts over the 4 years. If the forecast is accurate, however, remains to be seen as data for 2018 is not yet available.

Total monthly tickets with a Simple Moving Average smoother and AR model of order 10.

In conclusion, we have unearthed a few of New York City drivers’ secrets, however, many more remain and are up to the user to reveal via a full exploration of the dashboard. May the force be with you.

To leave a comment for the author, please follow the link and comment on their blog: R – NYC Data Science Academy Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)