analyze the national longitudinal surveys (nls) with r

November 25, 2014

(This article was first published on asdfree by anthony damico, and kindly contributed to R-bloggers)

as the oldest panel survey out there, the bureau of labor statistics‘ national longitudinal surveys (nls) have been operating so long that justin bieber’s grandparents might be too young (and also too legally canadian) to have participated in the initial cohorts.  don’t let the panel study of income dynamics people tell you otherwise, this is the ongoing study with the first interviews out of the gate.  are the respondents from the 1966 cohorts still being interviewed?  no.  but that’s because this is an employment survey – each cohort lasts only a few decades before being re-spawned with a shiny new batch of respondents.  not a study of retirement or of health, the panel periods are optimized to examine the relationship between teenage years and careers.

the irrefragable starting point is this bullet pointed description of each panel’s sample universe.  for example, nlsy97 is a nationally-representative sample of americans born during the first half of the reagan revolution who are still being assessed about their, well, pretty much everything.  once you pick a cohort, click the damn link and read their convenient introductions to exactly who you’ll have the pleasure of studying.  and don’t forget, this is the wrong survey for cross-sectional analyses.  this isn’t the place to assess the unemployment rate in 2011. but if you want to look at how many jobs the same individual has held across the past thirty years, lookie here.  this new github repository contains three scripts:

download all microdata.R

longitudinal analysis examples.R

  • create a complex sample survey object across almost fifteen years of interviews, using a taylor-series linearization design and a delightful function that makes choosing weights easier than pie
  • conduct a slew of analyses (is slew to analysis like gaggle to goose?) in an overwhelmingly successful demonstration of the power, brilliance, and mystique of this panel microdata


  • create a complex sample survey object that uses variables from both round one and round fifteen but only weights from the round one interviews, which will bias your results so don’t do that irl okay?
  • deftly match the bls-provided statistics, standard errors, deffs, and defts on this page

click here to view these three scripts

for more detail about the national longitudinal surveys (nls), visit:


though the bls dot gov slash nls homepage and the ohio state-run nlsinfo might seem like disjoint systems at first, these microdata aren’t terribly challenging to analyze so long as you follow the r code i’ve provided.  each new series of interviews gets loaded into their online investigator system as an independent data file, but i couldn’t figure out why we shouldn’t just download absolutely everything and make the panel-weighting a cinch.  so i did.  you can too.

confidential to sas, spss, stata, and sudaan users: antiques roadshow has some bad news for you.  you paid too much.  time to transition to r. 😀

To leave a comment for the author, please follow the link and comment on their blog: asdfree by anthony damico. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)