The completeness of online gun shooting victim counts

[This article was first published on Wiekvoet, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

There are a number of on line efforts to register victims of shootings online. Shootingtracker tries to register all mass shootings, those with four or more victims. Slate had the gun death tally (GDT), gun deaths starting at Newtown, running through to December 31, 2013. This project is continued in the Gun Violence Archive.
In this post I am comparing the 2013 data of shootingtracker and GDT with CDC data of 2009 to 2011. Compared to each other shootingtracker and GDT are similar, but the CDC data has much higher counts.

Shootingtracker and Gun Death Tally

Shootingtracker has data of shootings with four or more victims. Since not everybody who is shot is dead, this makes the data uncomparable to CDC data. However, by restricting the selection to those shootings with four or more killed, it is still possible to make a comparison with GDT data. However the GDT data is not organized by incidence, but rather by victim. Its also appears that the state given is not the state of the incident, but rather the residence of the victim. In addition, the dates used in GDT and shootingtracker are not the same. Since both GDT and shootingtracker have web links for each record, it is possible to manually compare them. After this check there were 53 incidences, 49 from shootingtracker, 46 from GDT, 42 in common. Based on these data, using capture-recapture formula, approximately 54 incidences are estimated.

Gun Death Tally and CDC

For CDC the crude rates from 2009 to 2011 were extracted, with the following ICD-10 Codes:
   X72 (Intentional self-harm by handgun discharge),
   X73 (Intentional self-harm by rifle, shotgun and larger firearm, discharge),
   X74 (Intentional self-harm by other and unspecified firearm discharge),
   X93 (Assault by handgun discharge),
   X94 (Assault by rifle, shotgun and larger firearm discharge),
   X95 (Assault by other and unspecified firearm discharge)
Data from GDT are summarized by state and divided by inhabitants to obtain a rate.
The plot shows huge differences. While the years covered are different, the year to year variation in the CDC data seems much less than the difference with GDT. Washington DC, which seemed so bad in shootingtracker is bad in all data bases. However, it does not stick out as much, it just appears that things are more easily registered there.

Appendix 1: CDC data

Centers for Disease Control and Prevention, National Center for Health Statistics. Underlying Cause of Death 1999-2011 on CDC WONDER Online Database, released 2014. Data are from the Multiple Cause of Death Files, 1999-2011, as compiled from data provided by the 57 vital statistics jurisdictions through the Vital Statistics Cooperative Program. Accessed at on Nov 2, 2014 10:56:15 AM

Dataset: Underlying Cause of Death, 1999-2011
Query Parameters:
2013 Urbanization: All
Autopsy: All
Gender: All
Hispanic Origin: All
ICD-10 Codes: X72 (Intentional self-harm by handgun discharge), X73 (Intentional self-harm by rifle, shotgun and larger firearm discharge), X74 (Intentional self-harm by other and unspecified firearm discharge), X93 (Assault by handgun discharge), X94 (Assault by rifle, shotgun and larger firearm discharge), X95 (Assault by other and unspecified firearm discharge)
Place of Death: All
Race: All
States: All
Ten-Year Age Groups: All
Weekday: All
Year/Month: 2009, 2010, 2011
Group By: State, Year
Show Totals: False
Show Zero Values: False
Show Suppressed: False
Calculate Rates Per: 100,000
Rate Options: Default intercensal populations for years 2001-2009 (except Infant Age Groups)

Appendix 2: R code for plot


cdc <- read.csv('Underlying Cause of Death, 1999-2011-3 - cleaned.txt',sep='t')
state_order <- group_by(cdc,State) %>% 
    summarize(.,CR=mean(Crude.Rate)) %>%
    arrange(.,CR) %>% .$State
state_order <-as.character(state_order)
cdc <- select(cdc,State,Year,Rate=Crude.Rate) 

slate1 <- read.csv('SlateGunDeaths.csv',
        stringsAsFactors=FALSE) %>% 
    mutate(.,Date=as.Date(date,format=”%Y-%m-%d”)) %>%
    mutate(.,State=toupper(state)) %>%
    select(.,Date,State) %>%
    filter(.,Date>as.Date(‘2013-01-01’) )

states <- data.frame(StateAbb=as.character(,
inhabitants <- read.csv('NST-EST2013-01.treated.csv')
#put it all together
states <- rbind(states,data.frame(StateAbb='DC',
        State=’District of Columbia’))
states <- merge(states,inhabitants)
slate2 <- ~ State, slate1)) %>% 
    rename(., Killed=Freq) %>%
    inner_join(states,.,by=c(‘StateAbb’=’State’)) %>%
    mutate(.,Rate=100000*Killed/Population) %>%
    mutate(.,Origin=’Slate’) %>%
    mutate(.,Year=2013) %>%

rates <- rbind_list(cdc,slate2) %>%
   mutate(Year=factor(Year)) %>%

ggplot(rates,aes(x=State,y=Rate,colour=Year,shape=Origin) ) +
    geom_point() +
    ylab(‘Rate (per 100000)’) +

To leave a comment for the author, please follow the link and comment on their blog: Wiekvoet. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)