analyze the medical expenditure panel survey (meps) with r

January 7, 2013

(This article was first published on asdfree by anthony damico, and kindly contributed to R-bloggers)

the meps household component leads the pack for examining individual-level medical expenditures by payor and type of service.  total expenditures captured by the survey tend to be low, but unbiased across the board and can be adjusted to match the national health expenditure accounts.  i wrote the wikipedia article, so it’s data-oriented.  if you vandalize it, i will revert your changes and t.p. your front yard.  give it a read for more details about what’s possible.  the agency for healthcare research and quality (ahrq) produces meps and rhymes with shark.

the medical expenditure panel survey – household component (meps-hc) contains data laid out a few different ways.  the consolidated file has one-record-per-person with all the complex sample survey variables.  start there.  the eight event files contain one-record-per-person-per-event, and (except for the supplies/vision table) those events have some sort of dates.  crikey.  there are tables with one-record-per-person-per-medical-condition, one-record-per-job, even a one-record-per-person-per-interview-per-private-health-plan table for anyone who wants to spend less time with his or her family.  if you merge anything to the consolidated file, make sure you understand the difference between setting the parameter all.x = TRUE versus all.x = FALSE — some respondents have zero records in the non-consolidated files, others have multiple.  hot tip: you probably want to aggregate non-consolidated files somehow.  you might use tapply and aggregate, but i prefer aggregation using sql.

everything can be read in as a sas transport file (.ssp) using read.xport, but if you like making things harder than they have to be (i.e. if you ride a fixie), you can also follow the example buried in the ?read.SAScii  documentation.  ahrq draws the meps sample from the national health interview survey, interviews about thirty-five thousand individuals per year, and keeps everyone in the panel for two years.  half of the respondents are in their first of two years of interviews, half are in their second.  capice?  meps generalizes to the us non-institutional, non-active duty military population.  this new github repository contains three scripts:

1996-2010 household component – download all microdata.R

  • loop through every year and every file type, download, then rename according to a pattern
  • save each file as an r data file (.rda) and (if specified by the user) sas transport (.ssp), comma-separated value (.csv), and stata-readable (.dta)
  • download the codebook and documentation, if available

2010 consolidated – analyze with brr.R

  • load the r data file (.rda) created by the download script (above)
  • set up the balanced repeated replication design outlined in this document
  • perform a boatload of analysis examples (spoiler: there will be barplots)

2010 consolidated – analyze with tsl.R

  • load the r data file (.rda) created by the download script (above)
  • set up a taylor-series linearization survey design outlined in this document
  • perform the same boatload of analysis examples

click here to view these three scripts

for more detail about the medical expenditure panel survey – household component (meps-hc), visit:


if you don’t know which analysis method to use, choose the replicate weights.  replicate weighting requires slightly more ram, but taylor-series designs don’t allow the computation of a confidence interval around quantile statistics (like the median).

this repository doesn’t include a script to replicate the meps taylor-series linearization or replicate-weighted methods of variance calculation, because i wrote the original journal article with meps.  it’s legit.

if you just want a one-off statistic and can’t bear to get your typing fingers dirty, try their fabulous table-building website mepsnet

confidential to sas, spss, stata, sudaan users: why are you still making calls with two tin-cans and a string now that we’ve created cell phones?  time to transition to r.  😀

To leave a comment for the author, please follow the link and comment on their blog: asdfree by anthony damico. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)