analyze the demographic and health surveys (dhs) with r

July 8, 2014

(This article was first published on asdfree by anthony damico, and kindly contributed to R-bloggers)

professors of public health 101 probably cite the results of the demographic and health surveys (dhs) more than all other data sources combined.  funded by the united states agency for international development (usaid) and administered by the technically-savvy analysts at icf international, this collection of multinational surveys enters its third decade as the authoritative source of international development indicators.  want a sampler of what that all means?  load up the dhs homepage and watch the statistics fly by: 70% of beninese children younger than five sleep under an insecticide-treated bednet / more than a third of kyrgyz kids aged 6-59 months have anemia / only 35% of guinean households have a place to wash yer hands.  this is the front-and-center toolkit for professional epidemiologists who want to know who/what/when/where/why to target a public health intervention in any of these nations.

before you read any more about the microdata, look at  this online table creator might give you access to every statistic that you need, and without the fuss, muss, or missing values of a person-level table.  (bonus: click here to watch me describe dhs-style online table creation from a teleprompter.)  why should you use statcompiler?  because it’s quick, easy, and has aggregated statistics for every country at your fingertips.

if that doesn’t dissuade you from digging into an actual data set, one more point of order: you’ll likely only be given access to a small number of countries.  so when applying for access, it’d be smart to ask for whichever country you are interested in _and also_ for malawi 2004.  that way, you will be able to muck around with my example syntax using the data tables that they were intended for.  if you have already registered, no fear: you can request that malawi be added to your existing project.  i tried requesting every data set.  i failed.  the data archivists do not grant access to more than a country or two, so choose what countries you need sparingly.  while semi-restricted, this is public data:  so long as you have a legitimate research question, you’ll be granted access without cost.  this new github repository contains three scripts:

download and import.R

analysis examples.R


  • load the 2004 malawi individual recodes file into working memory
  • re-create some of the old school-style strata described in this forum
  • match a single row from pdf page 324 all the way across, deft and all.

click here to view these three scripts

for more detail about the demographic and health surveys (dhs), visit:


next to the main survey microdata set, you’ll see some roman numerals ranging from one through six.  this number indicates which version manual of the survey that particular dataset corresponds to.  different versions have different questions, structures, microdata files: read the entire “general description” section (only about ten pages) of the manual before you even file your request for data access.

these microdata are complex, confusing, occasionally strangely-coded, and often difficult to reconcile with historical versions.  (century month codes? wowza.)  that’s understandable, and the survey administrators deserve praise for keeping everything as coherent as they have after thirty years of six major questionnaire revisions of ninety countries of non-english-speaking respondents across this crazy planet of ours.  if you claw through the documentation and cannot find an explanation, you’ll want to engage the user forum.  they are thoroughly responsive, impressively knowledgeable, and will help you get to the bottom of it – whatever `it` may be.  before you ask a question here, or really anywhere in life, have a solid answer to whathaveyoutried.  and for heavens’ sakes,* prepare a reproducible example for them.

* my non-denominational way of saying heaven’s sake.

confidential to sas, spss, stata, and sudaan users: i would shake your hand but you’ve yet to adopt the statistical equivalent of coughing into your sleeve.  time to transition to r.  😀

To leave a comment for the author, please follow the link and comment on their blog: asdfree by anthony damico. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Recent popular posts


Mango solutions

RStudio homepage

Zero Inflated Models and Generalized Linear Mixed Models with R

Dommino data lab

Quantide: statistical consulting and training



CRC R books series

Six Sigma Online Training

Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)