analyze the national survey of children’s health with r

November 18, 2013

(This article was first published on asdfree by anthony damico, and kindly contributed to R-bloggers)

american children of the nineties might have had pogs, beanie babies, m.c. hammer, but we lacked a reliable source for state-level survey estimates on health.  then in 2003, the maternal and child health bureau of the health services and resources administration (hrsa) launched the first national survey of children’s health (nsch) and researchers interested in state-specific health measures on youngsters traded in their inferior data sources faster than you’d exchange your tamagotchi for an ipod shuffle.  this is a telephone survey, so every four or five years, interviewers barrage the cellular network in search of households with at least one child under 18 to ask some lucky parent questions about the demographics, mental/physical/dental health, insurance coverage, access to care, medical experiences, even school activities of their offspring.  the final microdata contains more than eighteen hundred telephone responses from every state and generalizes to non-institutionalized children aged zero to seventeen both in each state and also nationwide.

nsch stands out as the best way to answer research questions on state-level children’s health care and the family environment, especially for younger tykes who are often an afterthought in other survey data sets.  the youth risk behavior surveillance system (yrbss) only interviews teenagers, the national longitudinal study of adolescent health (addhealth) follows kids for long periods of time but doesn’t follow nearly as many of them, and while the national health interview survey (nhis) includes a sample child component, it cannot be used for state-level estimates.  before you sink your teeth in this data set, you must read the frequently asked questions document.  eight pdf pages of bliss.

just so you know, emily rowe at the university of chicago’s data science for social good co-authored this blog post and all the r scripts in between.  emily contacted me out of the blue to offer up some nsch code she had written for an impact evaluation of the nurse-family partnership, and i convinced her to spend a few more days polishing up her work for public consumption.  so here you have it.  this new github repository contains three scripts:

download and import.R

  • download the microdata years you’ve requested
  • unzip the microdata files you’ve requested, alongside multiply-imputed poverty
  • merge to construct a fantastic five implicates
  • save the r data files to your local disk

analysis examples.R

  • load and slim down the five implicates to only the variables you need right about now
  • construct a taylor-series linearized complex sample survey design object with multiply-imputed poverty
  • recode a few variables just so you can do the same when you leave the nest
  • perform five, ten, a million different analysis examples so you know how to do most anything


  • load and slim down the same five implicates
  • construct the same taylor-series linearized complex sample survey design object with the same multiply-imputed poverty
  • match statistics, standard errors, confidence intervals displayed by child health data dot org
  • for statistics broken out by poverty, run the analysis the more rigorous (ie. the right) way

click here to view these three scripts

for more detail about the national survey of children’s health (nsch), visit:

  • the child health data dot org online table creator
  • the cdc’s nsch homepage, as part of the state and local area integrated telephone survey, a better source of information than..
  • the health services and resources administration’s maternal and child health home, which does not yet have the latest survey results posted


the main portal to all things nsch is actually run by the oregon health and science university’s child and adolescent health measurement initiative.  they have tens of hundreds of statistics and data briefs, articles, presentations, etc, etc, etc. that you owe it to yourself to review before you embark on your own analysis.

confidential to sas, spss, stata, and sudaan users: if attacked by a firehouse, you might need more than a cocktail umbrella.  time to transition to r. 😀

To leave a comment for the author, please follow the link and comment on their blog: asdfree by anthony damico. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.


Mango solutions

RStudio homepage

Zero Inflated Models and Generalized Linear Mixed Models with R

Dommino data lab

Quantide: statistical consulting and training



CRC R books series

Six Sigma Online Training

Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)