I’ve been doing more analysis of the Philadelphia Homicide data that the Philadelphia Inquirer has published, and presented some of it at the Philadelphia UseR group yesterday. My slides [pdf] and source [knitr .Rnw] are on github.
I should be clear that I am not an expert on crime and murder. In fact, I’m not even fairly knowledgeable. If anyone out there with more expertise has strong criticism of my “analysis” (really, it’s just a rough exploration of the data), I’ll eat it, and I’ll look forward to your own analysis of the data (again, it’s right here). Here are some of the most striking patterns that I found.
ResultsFirst, here is the total number of murders that occurred over the past 23 years, broken down by the day of the week. The weekends are worse than the weekdays.
Next, here are the total number of murders by hour of the day. The hour of the day was not included in the data until 2006, so this only represents murders between 2006 and 2011. The plot is centered around midnight, so the afternoon of Day 1 is on the left, and the morning of Day 2 is on the right.
Here is the most striking plot that I produced this time around. It plots, by month, the average frequency of murders. The y-axis represents 1 murder every X days.
I also did some meager statistical analysis, specifically poisson regression with terms for the month (that is, January, February, etc, to look for a seasonal pattern), race of the victim, and weapon used. There was a significant month effect, but the coefficients didn’t have much of a pattern to them. I did use number of days in the month as an offset in the regression, so it’s not that. More importantly, there was an unsurprising main effect of race, but also a big interaction between race and weapon. Specifically, African American victims were way more likely to be killed by a gun.
Guns and knives are the two most common weapons used in murders in the data.
Update: There was a pretty serious flaw in my regression, in that if there was a Month where, say, no African Americans were murdered with a knife (and there were plenty), that month’s data was missing, rather than 0. Filling in the data appropriately to reflect months with 0 murders for a particular race x weapon combination, the estimates are pretty different. White murder victims are 5.71x times more likely to be murdered with a gun than a knife, while African American murder victims were 8.62x times more likely to be murdered with a gun than a knife, meaning African Americans are 1.51x times more likely to be shot than stabbed. So, that’s a pretty serious revision approximately halving the multiplier. I’ve already updated the linked code and slides.
So, gun deaths are an especially acute problem in the African American community. In fact, if you exclude gun deaths from the data, it actually looks like the racial disparity in murder rates has been narrowing.
It is purely coincidental that I’m posting this on the same day that the Philadelphia Police Department are doing a gun buyback. You can bring in a gun and receive a $100 Shoprite voucher, no questions asked. Seems like a good initiative.
Analysis DiscussionI spent a bit of time trying to figure out what I thought the most meaningful way to represent the murder rate was. First, I calculated the murder frequency by counting how many n murders there were a month, then divided that by the number of days in the month for (n murders/n days)=murders per day. But the resulting measure had values like 0.14 murders per day, which isn’t too informative. What people want to know about murders, or at least what I want to know, is how often murders happen, not how many happened in a given time window. So, instead, I calculated (n days/n murders)=days per murder.
The y-axis for the murder rate figures is also a logarithmic scale, which is both reasonable given the distribution of the data, and the impression of the timescale. From a human perspective, the difference between 1 day and 2 days feels larger than the difference between 3 weeks and 4 weeks. The y axis is also flipped, to indicate that smaller numbers mean “more often”. I managed the reversed log transformation by writing my own coordinate transformation using the new scales package. Here’s the R code.