analyze the behavioral risk factor surveillance system (brfss) with r and monetdb

December 17, 2012
By

(This article was first published on asdfree by anthony damico, and kindly contributed to R-bloggers)

experimental.  the behavioral risk factor surveillance system (brfss) aggregates behavioral health data from 400,000 adults via telephone every year.  it's um *clears throat* the largest telephone survey in the world and it's gotta lotta uses, here's a list neato.  state health departments perform the actual data collection (according to a nationally-standardized protocol and a core set of questions), then forward all responses to the centers for disease control and prevention (cdc) office of surveillance, epidemiology, and laboratory services (osels) where the nationwide, annual data set gets constructed.  independent administration by each state allows them to tack on their own questions that other states might not care about.  that way, florida could exempt itself from all the risky frostbite behavior questions.  in addition to providing the most comprehensive behavioral health data set in the united states, brfss also eeks out my worst acronym in the federal government award - onchit a close second.

annual brfss data sets have grown rapidly over the past half-decade: the 1984 data set contained only 12,258 respondents from 15 states, all states were participating by 1994, and the 2011 file has surpassed half a million interviews.  if you're examining trends over time, do your homework and review the brfss technical documents for the years you're looking at (plus any years in between).  what might you find?  well for starters, the cdc switched to sampling cellphones in their 2011 methodology.

unlike many u.s. government surveys, brfss is not conducted for each resident at a sampled household (phone number).  only one respondent per phone number gets interviewed.  did i miss anything?  well if your next question is frequently asked, you're in luck.

all brfss files are available in sas transport format so if you're sittin' pretty on 16 gb of ram, you could potentially read.xport a single year and create a taylor-series survey object using the survey package.  cool.  but hear me out:  the download and importation script builds an ultra-fast monet database (click here for speed tests, installation instructions) on your local hard drive.  after that, these scripts are shovel-ready.  consider importing all brfss files my way - let it run overnight - and during your actual analyses, code will run a lot faster.  the brfss generalizes to the u.s. adult (18+) (non-institutionalized) population, but if you don't have a phone, you're probably out of scope.  this new github repository contains four scripts:

1984 - 2011 download all microdata.R
  • create the batch (.bat) file needed to initiate the monet database in the future
  • download, unzip, and import each year specified by the user
  • create and save the taylor-series linearization complex sample designs
  • create a well-documented block of code to re-initiate the monetdb server in the future

2011 single-year - analysis examples.R
  • run the well-documented block of code to re-initiate the monetdb server
  • load the r data file (.rda) containing the taylor-series linearization design for the single-year 2011 file
  • perform the standard repertoire of analysis examples, only this time using sqlsurvey functions

2010 single-year - variable recode example.R
  • run the well-documented block of code to re-initiate the monetdb server
  • copy the single-year 2010 table to maintain the pristine original
  • add a new drinks per month category variable by hand
  • re-create then save the sqlsurvey taylor-series linearization complex sample design on this new table
  • close everything, then load everything back up in a fresh instance of r
  • replicate statistics from this table, pulled from the cdc's web-enabled analysis tool

replicate cdc weat - 2010.R
  • run the well-documented block of code to re-initiate the monetdb server
  • load the r data file (.rda) containing the taylor-series linearization design for the single-year 2010 file
  • replicate statistics from this table, pulled from the cdc's web-enabled analysis tool




click here to view these four scripts



for more detail about the behavioral risk factor surveillance system, visit:
  • the centers for disease control and prevention behavioral risk factor surveillance system homepage
  • the behavioral risk factor surveillance system wikipedia entry

notes:

if you're just scroungin' around for a few statistics, the cdc's web-enabled analysis tool (weat) might be all your heart desires.  in fact, on slides seven, eight, nine of my online query tools video, i demonstrate how to use this table creator.  weat's more advanced than most web-based survey analysis - you can run a regression.  but only seven (of eighteen) years can currently be queried online.

since data types in sql are not as plentiful as they are in the r language, the definition of a monet database-backed complex design object requires a cutoff be specified between the categorical variables and the linear ones.  that cut point gets defined using the check.factors argument in the sqlsurvey() and sqlrepsurvey() function calls.  check.factors defaults to ten, but can be raised or lowered as needed.  here's how it works:


confidential to sas, spss, stata, sudaan users: when statistical languages are plotted on cartesian coordinates, what-you-paid-for vs. what-you-get are best represented as y = 1/x.  time to transition to r.  :D

To leave a comment for the author, please follow the link and comment on his blog: asdfree by anthony damico.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.