the social security administration (ssa) must be overflowing with quiet heroes, because their public-use microdata files are as inconspicuous as they are thorough. sure, ssa publishes enough great statistical research of their own that outside researchers rarely find ourselves wanting more and finer data that this agency can provide, but does that stop them from releasing detailed microdata as well? why no. no it does not. if you wake up one morning with a hankerin’ to study the person-level lifetime cash-flows of fdr’s legacy, roll up your sleeves and start right here.
compared to the other data sets on asdfree.com, the social security administration public use microdata files (ssapumf) are as straightforward as it gets. you won’t find complex sample survey data here, so just review the short-and-to-the-point data descriptions then calculate your statistics the way you would with other non-survey data. each of these files contain either one record per person or one record per person per year, and effortlessly generalize to the entire population of either social security number holders (most of the country) or social security recipients (just beneficiaries). the one-percent samples should be multiplied by 100 to get accurate nationwide count statistics and the five-percent samples by 20, but ykta (my new urban dictionary entry). this new github repository contains one script:
download all microdata.R
- download each zipped file directly onto your local computer
- load each file into a data.frame using a mixture of both fancery and schmantzery
- reproduce the overall count statistics provided in each respective data dictionary
- save each file as an R data file (.rda) for ultra-fast future use
for more detail about the social security administration public use microdata files (ssapumf), visit:
- the social security administration home page
- the social security administration open data initiative
- the national archives’ history of social security
i skipped importing these new beneficiary data system (nbds) files because i broadly distrust data older than i am and you probably want these easy-to-use, far more current files anyway.
confidential to sas, spss, stata, and sudaan users: no doubt they were very impressive when they originally became available. but so was the bone flute. time to transition to r. 😀