analyze the national incident-based reporting system (nibrs) with r and monetdb

November 11, 2014
By

(This article was first published on asdfree by anthony damico, and kindly contributed to R-bloggers)

in 2012, more than one quarter of the united states population lived in the jurisdiction of a police department that submitted details about every crime to a central repository maintained by the fbi.  a production of the uniform crime reports (ucr) program, the national incident-based reporting system (nibrs) compiles statistics from police agencies in thirty five states plus dc.  if you are just looking for general crime counts, those justice statisticians might have already tabulated your number here.  but for the more discriminating criminal behavior aficionado, the university of michigan’s inter-university consortium for political and social research (icpsr) maintains every microdata extract of criminal events, offense types, victims, and arrestees as far back as the first bush administration in its national archive of criminal justice data (nacjd).

this is event-level american criminal activity microdata, compiled and published by the fbi and then curated by the university of michigan.  it’s for you.  download it.  study it.  hold it upside-down and sideways and run analyses on it until you pass out.  if you spot anything newsworthy, tell the world.  it is your data to do whatever you like with.  that is remarkable, isn’t that remarkable?  i’ve consistently been astounded by the dedication of federal agencies in the united states publishing their microdata for scrappy outside researchers like you and me.  but there’s one hitch: the public use files do not match what the fbi publishes.  [email protected] at the fbi told me..

The data may be different because the first link is from the FBI UCR Program’s NIBRS publication which is a snapshot in time.  For example, the 2012 deadline for data to be included in the CIUS publication would have been in March 2013. The states/agencies had until the end of 2013 to submit additional data and make adjustments before the master closed early in 2014.

..and tomz@umich.edu at the national archive of criminal justice data said..

One possibility for the numbers not tying out exactly is whether the FBI counts all the agencies in the data. For UCR data tables the FBI sometimes only counts agencies that reported for the entire 12 months. I would look to see if your counts are larger than the FBI’s, and I’d see if the number of agencies you are using is different from the FBI. Another possibility is that the FBI can update their data at any time, and we are not always made aware of that.

..so when you run a query, you will not reproduce fbi counts precisely.  results are close, but not exact.  you’ll see that the reproduction syntax is imperfect replication.  oh, and once you’ve run the download automation syntax, the monetdb analysis speeds will outrun even the fastest of imaginary crime-fighting superheroes.  this new github repository contains two scripts:

download all microdata.R

  • create the batch (.bat) file needed to initiate the monet database in the future
  • log into the university of michigan’s website with the free login info you’ll have to obtain beforehand
  • download every data file from this study to the local disk
  • loop through each dat file in the current working directory, import them into monet with read.sascii.monetdb
  • create a well-documented block of code to re-initiate the monetdb server in the future

reproduce fbi tables.R

  • initiate the same monetdb server instance, using the same well-documented block of code as above
  • create three fbi-produced data tables off of the actual microdata, close but not exactly.
  • be amazed.  that was dozens of queries, each on millions of records.  and it worked on your laptop.  wow.

click here to view these two scripts

for more detail about the national incident-based reporting system (nibrs) microdata, visit:

notes:

the preliminary 2013 crime statistics show a major expansion in the united states population covered by departments participating in nibrs (in table one, compared to 2012 and 2011), so before you trend anything, make sure to examine which police agencies in the locality that you are interested in contributed their data to the program.  in other words, don’t confuse a new municipality reporting crime statistics to the fbi with a spike or dip in the crime rate.  right?  right.

this is not survey data, so use normal statistical tests (not survey-adjusted ones) like these commands in your monetdb sql code to compute measures of variation like a confidence interval.  and remember, for more sql query construction help, try the w3schools tutorial and also just searching for specific commands in my archive.

confidential to sas, spss, stata, and sudaan users: these languages will vanish, like d. b. cooper.  time to transition to r.  😀

To leave a comment for the author, please follow the link and comment on their blog: asdfree by anthony damico.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)