Mystery solved: The discrepancy in homicide data

[This article was first published on Diego Valle's Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I’ve been complaining about how homicide statistics from police sources were too low in 2009, with the entire state of Chihuahua having less homicides than its biggest city. I was thinking of finding out if I could use the IFAI (Freedom of Information Access) to obtain the original CIEISP forms which the state police forces are supposed to fill out each month and send to the National System of Public Security (SNSP) for tallying, to see if there were any unusual patterns, but someone beat me to requesting the data.

You can see the request here, and download the excel file with the CIEISP forms here.

The main difference between the datasets is that the CIEISP forms usually report lower numbers than the data the SNSP gave to the ICESI. This is not surprising since the CIEISP forms have no recorded homicides in some states during the last months of the year (the request was made in December 2009), but the numbers for Chihuahua in both datasets are identical, with a total of 2523 homicides recorded. More importantly, in the months of November and December there were no homicides registered in the CIEISP forms, and the data for October looks incomplete. That’s the reason Chihuahua had such a low homicide rate according to the police, the data they gave to the ICESI only includes 9¾ months of homicides.

Given that Mexican President Felipe Calderón and the former Attorney General Eduardo Medina-Mora are on record as stating that violence in Mexico is low, it is not surprising that a government agency would give misleading information to an NGO. I bet the UN, a couple of think-tanks and newspapers will use the lower number, the data will be considered “official”, and then next year the SNSP will quietly update the data, just like what happened in 2008.

To estimate the homicides in the missing months I deleted the last three months from the CIEISP forms and predicted them from a linear regression. The predicted number of homicides in Chihuahua for the whole year was 3256. I still think that even with the missing months added in, the police data will be missing about 400 homicides (according to Milenio there were 3,687 narco-executions in Chihuahua), just like in 2008, but at least it no longer looks ridiculously low.

From the chart it looks as if the SNSP only gave “preliminary” data for Chihuahua, coincidentally the most violent state in Mexico. Here’s a chart with the estimated data for Chihuahua, the rest of the states use the original numbers from the SNSP:

It looks really bad, Chihuahua saw its murder rate go from 76 to 96 (though likely more) and Durango’s homicide rate more than doubled. Chihuahua in 2009 nearly had the murder rate of Ciudad Juarez in 2008!

We can also look at the execution rates reported in David Shirk’s Drug Violence in Mexico Data and Analysis as tallied by 3 major Mexican newspapers: El Universal, Milenio and Reforma. For the most part the tallies of the newspapers are similar with Milenio usually reporting higher numbers than Reforma, and El Universal in the middle. The big exception is Chihuahua, where Milenio reported 1555 more executions than Reforma:

Number of narco-executions in Chihuahua:
El Universal3,250

The numbers for Reforma look too low since in Ciudad Juarez there normally used to be about 200 murders a year, and in 2009 there were more than 2600, I’m pretty sure the vast majority of those extra murders were due to the war between the Sinaloa and Juarez Cartels—and that’s only one city—so the number provided by Reforma looks too low. On the other hand there probably were about 3600 homicides in the whole state of Chihuahua and I doubt each and every one of them was related to the drug cartels, so the number provided by Milenio looks too high. I just split the difference and used the average.

Here’s the percentage of homicides due to narco-executions:

Like I said the number of homicides in Chihuahua is probably an underestimate and the percentage of homicides which are linked to narco-executions is likely a little bit lower.

P.S. You can download the data and code from my Github account (the first time you run the program it will download a 3MB map of Mexico from GADM)

To leave a comment for the author, please follow the link and comment on their blog: Diego Valle's Blog. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)