analyze the home mortgage disclosure act (hmda) microdata with r and monetdb

September 23, 2013

(This article was first published on asdfree by anthony damico, and kindly contributed to R-bloggers)

back in 1975, congress had it up to here with discriminatory lending practices and decided to require financial organizations originating home mortgages to report some basic operational statistics publicly.  the home mortgage disclosure act mandated a major ramp-up in the transparency of home-lending activity across the country.  almost forty years later, the data are better than ever.  the main downloadable file – the loan application record (lar) – contains one-record-per-loan-application (regardless of origination) and comprises upwards of ninety percent of all federal housing administration (fha) loans.  there’s also a one-record-per-lending-institution table (ins), but that’ll be merged to the lar for your convenience.  you know, just in case you want to look at loan-by-loan bank activity in your neighborhood.  like most thorough public data providers, the federal reserve provides its own summary report.  so give it a skim before you start writing code.

the gregorians celebrate the new year in january, the chinese in february, but the federal financial institutions examination council (ffiec) drops their ball in times square with a data release every september.  prospero ano everybody, because the latest hmda (pronounced hum-duh) microdata have arrived.  clocking in between ten- and thirty-five million records per year, this looks like a job for monetdb.  it’s sexy, it’s free, it’s the perfect companion for big public data.  make learning a new language your resolution.  this new github repository contains two scripts:

download all microdata.R

  • initiate a monetdb server on your local machine to house every table and every year of hmda
  • download and, without taking a breath, import every file into monetdb
  • merge the loan application record table with the institutional records table, for future easy access
  • construct some race and ethnicity variables to match those published by ffiec

replicate ffiec publications.R

  • open up and then connect to a monetdb server instance, like a champ
  • present a few simple sql queries so you can take it from here
  • reproduce a few sets of numbers published by the united states government

click here to view these two scripts

for more detail about hmda, visit:


if your research requires anything prior to 2006, you might need to order the older data sets from the national archives.  i believe they’ll mail it on some 8-trax.

and thanx to max over at furman for both technical and moral support.

confidential to sas, spss, stata, sudaan users: get ready for your semicolonoscopy.  time to transition to r.  😀

To leave a comment for the author, please follow the link and comment on their blog: asdfree by anthony damico. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.


Mango solutions

RStudio homepage

Zero Inflated Models and Generalized Linear Mixed Models with R

Quantide: statistical consulting and training


CRC R books series

Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)