Guns are Cool – States

Posted on July 26, 2014 by Wingfeet in R bloggers | 0 Comments

[This article was first published on Wiekvoet, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Last week I looked at time effects of the shootingtracker database. This week I will look at the states. Some (smaller) states never made it on the database. Other states, far too frequently. The worst of these California. After correcting for population size, Washington DC with 0.8 shootings per year per 100000 inhabitants sticks out. Subsequently, the observed incidences have been shrunk towards the common distribution. In these posterior estimates Massachusetts is best while Washington DC, Illinois and Louisiana are worst.

Data

Data are from the shootingtracker database and have been updated July 25. The last entry in the database is July 25. The script is a slightly adapted version from last week. Most notably, I did not notice last week the database uses KS for Kansas, but KA is what my other data uses. Note that in order make this post readable, all code is at a separate section.

Adding state information

The first thing is to get one record per state. The datasets package has state information, while number of inhabitants (2013 data) is plucked from census.gov. After merging some states have missing counts for shootings absence in the shootingtracker database is taken as 0 shootings.

Incidence is calculated per 100 000 inhabitants per year (365 days).

Calculating posteriors

Assuming a beta prior and binomial distribution for number of shootings, Jags is used to estimate posteriors. The one noticeable thing in the model is the prior for b. b really wants to be high, hence the prior is quite a bit wider than usual. The current values seem to be acceptable, see also the output at the bottom of this post.

The plot shows a number of things. In red are the observed values. In black the posterior values; means and 90% intervals of the samples. The blue line is the value to which the data is shrunk. Washington DC is shrunk quite a bit, because it has relatively few people and a large value. The same is true for Delaware. Illinois on the other hand has many inhabitants, hence is shrunk much less. It also has a more narrow interval. On the other side, states like Wyoming are shrunk upwards. Again, more inhabitants means less shrinking, Massachusetts ends up as having least number of shootings. Really big states, New York, Texas and California, are not shrunk at all and have quite narrow intervals. Texas and NY on the good side, California on the bad side.

The map shows the posteriors in a different way. At this point it might be worth noting that the north east is actually doing quite well, but since these states have so little inhabitants they are shrunk upwards and look bad. The current approach does not provide a natural way to avoid this. In a year or two additional data may cause at least some change in posteriors of the smaller states, both in the top and the bottom of this list.

Code

reading data

r13 <- readLines('raw13.txt')
r14 <- readLines('raw14.txt')
r1 <- c(r13,r14)
r2 <- gsub('\\[[a-zA-Z0-9]*\\]','',r1)
r3 <- gsub('^ *$','',r2)
r4 <- r3[r3!='']
r5 <- gsub('\\t$','',r4)
r6 <- gsub('\\t References$','',r5)
r7 <- read.table(textConnection(r6),
sep=’\t’,
header=TRUE,
stringsAsFactors=FALSE)
r7$Location[r7$Location==’Washington DC’] <-
‘WashingtonDC, DC’
r8 <- read.table(textConnection(as.character(r7$Location)),
sep=’,’,
col.names=c(‘Location’,’State’),
stringsAsFactors=FALSE)
r8$State <- gsub(' ','',r8$State)
r8$State[r8$State==’Tennessee’] <- 'TN'
r8$State[r8$State==’Ohio’] <- 'OH'
r8$State[r8$State %in% c(‘Kansas’,’KA’)] <- 'KS'
r8$State[r8$State==’Louisiana’] <- 'LA'
r8$State[r8$State==’Illinois’] <- 'IL'
r8$State <- toupper(r8$State)
r7$StateAbb <- r8$State
r7$Location <- r8$Location
r7 <- r7[! (r7$State %in% c( 'PUERTORICO','PR')),]
r7$Date <- gsub('/13$','/2013',r7$Date)
r7$date <- as.Date(r7$Date,format="%m/%d/%Y")

state information

r7$one <- 1

agg1 <- with(r7,

aggregate(one,by=list(StateAbb=StateAbb),FUN=sum))

library(datasets)

states <- data.frame(StateAbb=as.character(state.abb),

StateArea=state.area,

State=as.character(state.name),

StateRegion=state.region,

Frost=state.x77[,’Frost’]

)

# http://www.census.gov/popest/data/state/totals/2013/index.html

# data pre processed so the csv can be used as is.

inhabitants <- read.csv('NST-EST2013-01.treated.csv')

#put it all together

states <- rbind(states,data.frame(StateAbb='DC',

StateArea=68.3,

State=’District of Columbia’,

StateRegion= ‘South’,Frost=NA))

r9 <- merge(states,inhabitants)

r10 <- merge(r9,agg1,all=TRUE)

# No data in some states

r10$x[is.na(r10$x)] <- 0

r10$incidence <- 100000*r10$x/r10$Population*365/

as.numeric((max(r7$date)-min(r7$date)))

Jags model

library(R2jags)

datain <- list(

count=r10$x,

Population = r10$Population,

n=nrow(r10),

scale=100000*365/

as.numeric((max(r7$date)-min(r7$date))))

model1 <- function() {

for (i in 1:n) {

count[i] ~ dbin(p[i],Population[i])

p[i] ~ dbeta(a,b)

pp[i] <- p[i]*scale

}

a ~ dnorm(0,.000001) %_% T(0,)

b ~ dnorm(1e7,.00000000000001) %_% T(0,)

betamean <- scale * a/(a+b)

}

parameters <- c('a','b','betamean','pp')

jagsfit <- jags(datain,

model=model1,

parameters=parameters,

n.iter=20000,

n.burnin=1000,

n.chain=4,

progress.bar=”gui”)

jagsfit

Plot 1

samples <- jagsfit$BUGSoutput$sims.matrix[,5:55]
r10$pred <- apply(samples,2,mean)
r10$l05 <- apply(samples,2,function(x) quantile(x,.05))
r10$l95 <- apply(samples,2,function(x) quantile(x,.95))
library(ggplot2)
r11 <- subset(r10,,c(State,l05,l95,pred,incidence))
r11 <- r11[order(r11$pred),]
r11$State <- factor(r11$State,levels=r11$State)
p <- ggplot(r11, aes(y=pred, x=State))
limits <- aes(ymin =l05, ymax=l95)
dodge <- position_dodge(width=0.9)
p +geom_point(position=dodge, stat=”identity”) +
geom_errorbar(limits, position=dodge, width=0.25) +
theme(legend.position = “bottom”) +
geom_point(aes(y=incidence,x=State),
position=dodge, stat=”identity”,col=’red’) +
ylab(‘Incidence’) +
geom_hline(yintercept=mean(jagsfit$BUGSoutput$sims.matrix[,3]),col=’blue’) +
coord_flip()

Plot 2

# http://stackoverflow.com/questions/20434343/color-us-states-according-to-value-in-r

# your values (here as named vector)

library(maps)

values <- (r10$pred-min(r10$pred))/(max(r10$pred)-min(r10$pred))

#values <- (r10$incidence-min(r10$incidence))/

# (max(r10$incidence[r10$incidence<.5])-min(r10$incidence))

names(values) <- r10$state.abb

# getting the names used by map

tmp <- map('state',plot=FALSE,namesonly=TRUE)

# matching (after adjusting using gsub and tolower)

tmp <- match(gsub('(:.*)','',tmp),tolower(r10$State))

map(‘state’,

fill=TRUE,

col=rgb(colorRamp(c(‘green’,’red’))(values),max=255)[tmp])

Jags output

Inference for Bugs model at “C:/Users/Kees/AppData/Local/Temp/RtmpwbAFvR/modela3419a366f0.txt”, fit using jags,

4 chains, each with 20000 iterations (first 1000 discarded), n.thin = 19

n.sims = 4000 iterations saved

mu.vect sd.vect 2.5% 25% 50%

a 9.732 6.530 2.931 5.662 7.924

b 6207140.792 4125444.443 1898003.649 3609272.959 5094169.482

betamean 0.101 0.008 0.087 0.096 0.101

pp[1] 0.087 0.032 0.032 0.064 0.085

pp[2] 0.123 0.030 0.073 0.103 0.120

pp[3] 0.071 0.026 0.026 0.053 0.070

pp[4] 0.099 0.024 0.058 0.083 0.098

pp[5] 0.130 0.014 0.105 0.120 0.130

pp[6] 0.081 0.023 0.041 0.065 0.079

pp[7] 0.097 0.027 0.050 0.078 0.094

pp[8] 0.130 0.041 0.066 0.101 0.123

pp[9] 0.110 0.017 0.079 0.097 0.109

pp[10] 0.094 0.020 0.059 0.081 0.093

pp[11] 0.079 0.030 0.027 0.058 0.077

pp[12] 0.071 0.026 0.026 0.052 0.069

pp[13] 0.076 0.029 0.024 0.055 0.074

pp[14] 0.177 0.029 0.127 0.158 0.176

pp[15] 0.122 0.027 0.077 0.103 0.120

pp[16] 0.122 0.033 0.069 0.099 0.118

pp[17] 0.109 0.028 0.062 0.089 0.106

pp[18] 0.159 0.037 0.100 0.132 0.154

pp[19] 0.056 0.020 0.022 0.041 0.054

pp[20] 0.087 0.023 0.048 0.071 0.086

pp[21] 0.090 0.031 0.035 0.068 0.087

pp[22] 0.124 0.024 0.083 0.107 0.122

pp[23] 0.080 0.023 0.040 0.063 0.078

pp[24] 0.127 0.028 0.080 0.108 0.124

pp[25] 0.079 0.026 0.032 0.061 0.078

pp[26] 0.083 0.031 0.029 0.062 0.081

pp[27] 0.121 0.023 0.080 0.105 0.119

pp[28] 0.087 0.032 0.029 0.065 0.086

pp[29] 0.092 0.030 0.041 0.072 0.090

pp[30] 0.080 0.031 0.027 0.057 0.078

pp[31] 0.106 0.022 0.068 0.090 0.104

pp[32] 0.108 0.032 0.055 0.086 0.104

pp[33] 0.139 0.037 0.081 0.113 0.135

pp[34] 0.076 0.014 0.051 0.066 0.075

pp[35] 0.097 0.019 0.063 0.084 0.096

pp[36] 0.123 0.031 0.074 0.101 0.120

pp[37] 0.071 0.024 0.029 0.054 0.070

pp[38] 0.094 0.018 0.062 0.081 0.093

pp[39] 0.106 0.035 0.049 0.082 0.102

pp[40] 0.110 0.027 0.063 0.091 0.108

pp[41] 0.086 0.032 0.030 0.064 0.084

pp[42] 0.122 0.026 0.078 0.103 0.119

pp[43] 0.078 0.013 0.055 0.069 0.077

pp[44] 0.081 0.027 0.034 0.062 0.080

pp[45] 0.111 0.023 0.070 0.094 0.109

pp[46] 0.089 0.034 0.033 0.066 0.086

pp[47] 0.081 0.021 0.043 0.066 0.080

pp[48] 0.065 0.021 0.029 0.050 0.064

pp[49] 0.103 0.032 0.051 0.080 0.100

pp[50] 0.090 0.034 0.032 0.066 0.087

pp[51] 0.185 0.063 0.095 0.139 0.174

deviance 242.511 13.874 217.150 232.666 241.477

75% 97.5% Rhat n.eff

a 11.529 28.847 1.068 63

b 7337119.577 18410168.237 1.067 64

betamean 0.106 0.117 1.001 4000

pp[1] 0.106 0.158 1.003 910

pp[2] 0.140 0.190 1.001 2600

pp[3] 0.088 0.125 1.006 460

pp[4] 0.114 0.152 1.001 4000

pp[5] 0.139 0.159 1.002 1800

pp[6] 0.096 0.130 1.005 560

pp[7] 0.112 0.157 1.001 4000

pp[8] 0.151 0.230 1.001 3000

pp[9] 0.120 0.147 1.001 2800

pp[10] 0.107 0.137 1.003 1100

pp[11] 0.096 0.143 1.007 450

pp[12] 0.087 0.126 1.005 550

pp[13] 0.094 0.137 1.007 420

pp[14] 0.196 0.238 1.012 250

pp[15] 0.138 0.182 1.001 4000

pp[16] 0.140 0.200 1.001 2900

pp[17] 0.124 0.171 1.001 4000

pp[18] 0.182 0.242 1.008 350

pp[19] 0.069 0.098 1.008 320

pp[20] 0.102 0.137 1.002 2000

pp[21] 0.108 0.158 1.003 1200

pp[22] 0.139 0.178 1.003 970

pp[23] 0.094 0.128 1.002 1800

pp[24] 0.144 0.190 1.004 660

pp[25] 0.096 0.135 1.006 550

pp[26] 0.102 0.151 1.003 3700

pp[27] 0.135 0.170 1.002 1800

pp[28] 0.106 0.159 1.004 1200

pp[29] 0.110 0.160 1.004 1200

pp[30] 0.099 0.145 1.005 520

pp[31] 0.120 0.154 1.002 1400

pp[32] 0.127 0.179 1.003 1800

pp[33] 0.161 0.224 1.006 450

pp[34] 0.085 0.105 1.006 480

pp[35] 0.108 0.138 1.001 3600

pp[36] 0.141 0.191 1.002 1300

pp[37] 0.087 0.123 1.004 700

pp[38] 0.106 0.133 1.001 4000

pp[39] 0.125 0.184 1.002 4000

pp[40] 0.126 0.171 1.002 4000

pp[41] 0.106 0.154 1.003 1200

pp[42] 0.138 0.180 1.006 470

pp[43] 0.086 0.106 1.006 480

pp[44] 0.098 0.143 1.005 590

pp[45] 0.125 0.161 1.002 2500

pp[46] 0.109 0.164 1.004 700

pp[47] 0.094 0.126 1.004 700

pp[48] 0.080 0.110 1.008 320

pp[49] 0.120 0.173 1.004 790

pp[50] 0.109 0.166 1.003 1400

pp[51] 0.217 0.337 1.012 210

deviance 251.840 271.487 1.039 88

For each parameter, n.eff is a crude measure of effective sample size,

and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

DIC info (using the rule, pD = var(deviance)/2)

pD = 92.8 and DIC = 335.3

DIC is an estimate of expected predictive error (lower deviance is better).

To leave a comment for the author, please follow the link and comment on their blog: Wiekvoet.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Guns are Cool – States

Data

Adding state information

Calculating posteriors

Code

reading data

state information

Jags model

Plot 1

Plot 2

Jags output

Related

Data

Adding state information

Calculating posteriors

Code

reading data

state information

Jags model

Plot 1

Plot 2

Jags output

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)