analyze the public libraries survey (pls) with r

October 14, 2014

(This article was first published on asdfree by anthony damico, and kindly contributed to R-bloggers)

each and every year, the institute of museum and library services coaxes librarians around the country to put down their handheld “shhhh…” sign and fill out a detailed online questionnaire about their central library, branch, even bookmobile.  the public libraries survey (pls) is actually a census: nearly every public library in the nation responds annually.  that microdata is waiting for you to check it out, no membership required.  the american library association estimates well over one hundred thousand libraries in the country, but less than twenty thousand of those participate in this survey since most libraries in the nation are enveloped by some sort of school system.

laughably easy files to work with, these microdata do not require the r survey package or any of the batman-like statistical tools seen in the other public use file folders.  as confirmed by one of the administrators of this survey, your analysis can simply tabulate, sum, average, whatever else using the base commands in r rather than complex sample survey design commands.  since these data sets are the universe rather than a sample, i’ve foregone a set of analysis examples.  if you want to do something, search stackoverflow with an [r] tag.  no survey design assembly required.  this new github repository contains two scripts:

download all microdata.R

  • download each zipped year of data onto your local computer
  • load a trifecta of tables into RAM
  • save all three data.frame objects as an R data file (.rda)

replicate imls publications.R

click here to view these two scripts

for more detail about the public libraries survey (pls), visit:


plainly described at the bottom of pdf page 6 of the technical documentation, each year of microdata gets released as three tables: a table of library systems (where new york city public libraries would have one entry), a table of library buildings (where new york city public libraries have one entry per branch), one table of states (where all libraries in new york state get collapsed into one).  imls takes care not to disclose stuff like salary information of individual employees, and the more-aggregated tables require less confidentiality-related-data-squelching.  if you need microdata sans suppression, apply for the restricted use files.

confidential to sas, spss, stata, sudaan users: you are using the blockbuster video of statistical languages.  time to transition to r.  😀

To leave a comment for the author, please follow the link and comment on their blog: asdfree by anthony damico. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)