each and every year, the institute of museum and library services coaxes librarians around the country to put down their handheld “shhhh…” sign and fill out a detailed online questionnaire about their central library, branch, even bookmobile. the public libraries survey (pls) is actually a census: nearly every public library in the nation responds annually. that microdata is waiting for you to check it out, no membership required. the american library association estimates well over one hundred thousand libraries in the country, but less than twenty thousand of those participate in this survey since most libraries in the nation are enveloped by some sort of school system.
laughably easy files to work with, these microdata do not require the r survey package or any of the batman-like statistical tools seen in the other public use file folders. as confirmed by one of the administrators of this survey, your analysis can simply tabulate, sum, average, whatever else using the base commands in r rather than complex sample survey design commands. since these data sets are the universe rather than a sample, i’ve foregone a set of analysis examples. if you want to do something, search stackoverflow with an [r] tag. no survey design assembly required. this new github repository contains two scripts:
download all microdata.R
- download each zipped year of data onto your local computer
- load a trifecta of tables into RAM
- save all three data.frame objects as an R data file (.rda)
replicate imls publications.R
- produce the control counts on pdf page 73 of this document
- replicate the 2001 statistics shown on pdf page 76 of the same document
for more detail about the public libraries survey (pls), visit:
- the imls pls homepage
- a beautiful public library distribution in figure a of the 2010 report
- a severely less beautiful public library distribution in figure a of the 2011 report
plainly described at the bottom of pdf page 6 of the technical documentation, each year of microdata gets released as three tables: a table of library systems (where new york city public libraries would have one entry), a table of library buildings (where new york city public libraries have one entry per branch), one table of states (where all libraries in new york state get collapsed into one). imls takes care not to disclose stuff like salary information of individual employees, and the more-aggregated tables require less confidentiality-related-data-squelching. if you need microdata sans suppression, apply for the restricted use files.
confidential to sas, spss, stata, sudaan users: you are using the blockbuster video of statistical languages. time to transition to r. 😀