analyze the social security administration public use microdata files (ssapumf) with r

May 5, 2013

(This article was first published on asdfree by anthony damico, and kindly contributed to R-bloggers)

the social security administration (ssa) must be overflowing with quiet heroes, because their public-use microdata files are as inconspicuous as they are thorough.  sure, ssa publishes enough great statistical research of their own that outside researchers rarely find ourselves wanting more and finer data that this agency can provide, but does that stop them from releasing detailed microdata as well?  why no.  no it does not.  if you wake up one morning with a hankerin’ to study the person-level lifetime cash-flows of fdr’s legacy, roll up your sleeves and start right here.

compared to the other data sets on, the social security administration public use microdata files (ssapumf) are as straightforward as it gets.  you won’t find complex sample survey data here, so just review the short-and-to-the-point data descriptions then calculate your statistics the way you would with other non-survey data.  each of these files contain either one record per person or one record per person per year, and effortlessly generalize to the entire population of either social security number holders (most of the country) or social security recipients (just beneficiaries).  the one-percent samples should be multiplied by 100 to get accurate nationwide count statistics and the five-percent samples by 20, but ykta (my new urban dictionary entry).  this new github repository contains one script:

download all microdata.R

  • download each zipped file directly onto your local computer
  • load each file into a data.frame using a mixture of both fancery and schmantzery
  • reproduce the overall count statistics provided in each respective data dictionary
  • save each file as an R data file (.rda) for ultra-fast future use

click here to view this lonely script
for more detail about the social security administration public use microdata files (ssapumf), visit:


i skipped importing these new beneficiary data system (nbds) files because i broadly distrust data older than i am and you probably want these easy-to-use, far more current files anyway. 

confidential to sas, spss, stata, and sudaan users: no doubt they were very impressive when they originally became available.  but so was the bone flute.  time to transition to r.  😀

To leave a comment for the author, please follow the link and comment on their blog: asdfree by anthony damico. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)