the integrated public use microdata series international (ipumsi) has been my white whale since i started in survey research. non-demographers, perhaps think of this repository as a martryoshka varanasi-kaaba-ark of the covenant: nothing compares. the minnesota population center amassed half a billion person-level records from national statistics offices across the globe. it’s all free and ready for download, so long as you have a project idea and an institutional affiliation. so my turn to talk? because now the software needed for analysis is free as well, and markedly superior to anything that’s available for purchase. 277 censuses later, roll credits. these tutorials maniacally document every step necessary to
- download and import your extract either directly into working memory or definitively into a hyperfast column-store
- construct a probability-weighted survey design object with legitimate, defensible standard errors
- compute any statistic that a mad (social) scientist might conceive from this infinity
notes: unless you plan to make severe edits to my example code, individual extracts must contain a single year and a single country and be formatted as a csv. the actual extract link can simply be copied and pasted into your r script from the url highlighted in the screenshot below. each extract should include the variables “serial”, “strata”, and “perwt” if you plan on calculating statistics to be shared anywhere beyond fingerpainting class. these census files cannot be treated as simple random samples, those three columns contain the information necessary for my scripts to handle everything correctly.
confidential to sas, spss, stata, and sudaan users: neil armstrong would give pogo sticks the same look i’m giving your softwares right now. time to reserve your spot on apollo eleven. time to transition to r. 😀