Site icon R-bloggers

analyze the national health interview survey (nhis) with r

[This article was first published on asdfree by anthony damico, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
the national health interview survey (nhis) is a household survey about health status and utilization.  each annual data set can be used to examine the disease burden and access to care that individuals and families are currently experiencing across the country.  check out the wikipedia article (ohh hayy i wrote that) for more detail about its current and potential uses.  if you’re cooking up a health-related analysis that doesn’t need medical expenditures or monthly health insurance coverage, look at nhis before the medical expenditure panel survey (it’s sample is twice as big).  the centers for disease control and prevention (cdc) has been keeping nhis real since 1957, and the scripts below automate the download, importation, and analysis of every file back to 1963.

what happened in 1997, you ask?   scientists cloned dolly the sheep, clinton started his second term, and the national health interview survey underwent its most recent major questionnaire re-design.  here’s how all the moving parts work:

if you use anything more than the personsx file alone, you’ll need to merge some tables together.  make sure you understand the difference between setting the parameter all = TRUE versus all = FALSE — not everyone in the personsx file has a record in the samadult and samchild files.

this new github repository contains four scripts:

1963-2011 – download all microdata.R

2011 personsx – analyze.R 

2011 personsx plus samadult with multiple imputation – analyze.R

replicate cdc tecdoc – 2000 multiple imputation.R


click here to view these four scripts


for more detail about the national health interview survey (nhis), visit:

notes:

the national health interview survey is the first and only us government survey data set to include any r syntax examples (page 6).  an inspiration.

the cdc often includes supplemental survey questions in nhis.  check ’em out.

unless specified by the question’s phrasing, most nhis variables should be treated as point-in-time, as opposed to either annualized or ever during the year.  this distinction is particularly important for health insurance coverage.  think about these three statistics —
— the number of americans without health insurance right now is the point-in-time variable, smaller than the at least once number but larger than the ever number.


confidential to sas, spss, stata, and sudaan users: why are you still rocking out on that cassette tape after we’ve designed the ipod?  time to transition to r.  😀

To leave a comment for the author, please follow the link and comment on their blog: asdfree by anthony damico.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.