analyze the american national election studies (anes) with r

[This article was first published on asdfree by anthony damico, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

on election days in the united states, the news media peppers its coverage with quick, dirty exit polls that allow them to make coarse statements like, “x% of demographic group y voted for candidate z.”  the american national election studies are the scientific community’s response to those haphazard polls, for those of us who care more about having the number right than having the number right away.  available every presidential election since dewey defeated truman and every off-year congressional election since eisenhower’s first term, the anes has released a data set so that professional researchers, political junkies, partisan hacks could seriously figure out who voted for who.  and if any of you out there are personally running for office, consider this your best source of information to view the demographics and behavior of split-ticket voters.

although it might lag behind the published microdata, berkeley’s sda (survey documentation and analysis) online query tool has a few of the anes data files hot and ready for crosstabulation and simple regression.  before diving into either sda or the r code, perhaps review the available topics – with weighted proportions over time – posted on the main website.  you won’t be able to access any demographic breakouts there, but it’s the quickest way to view the ross perot anomaly.

choose which microdata file to work with after carefully reading your four study choices.  you could review the frequently asked questions as well, but only if you promise me you won’t read anything into spss.  most american national election studies generalize to all eligible voters in the united states, confirm the sample universe on the `weights summary` section of your selection.  and have fun.  have fun.  this new github repository contains four scripts:

download and import.R

analysis examples.R

replicate table one.R

replicate table two.R

click here to view these four scripts

for more detail about the american national election studies (anes), visit:


as you’d expect with any survey dating back to 1948, some of the weighting and confidence interval calculations have changed over time.  with five notable exceptions (see table one), the main anes data sets did not start including a sampling weight until 1992 – when it became the norm.  to further complicate your life, the more recent data sets include both a pre- and post-election weight.  if no weight variable exists, just add a column of all ones and make that your weighting variable – matching what they’ve done in the multi-year cumulative file.

if you only care about specific points-in-time (one of the cross-sectional time series studies), then simply find four variables to construct a taylor-series design: the strata variable, the primary sampling unit (also called the psu or cluster) variable, the pre-election weight, and the post-election weight.  as stated at the bottom of this page, if your analysis only involves questions asked during the pre-election portion, use the pre-election weight (the unweighted sample will be larger) – but if you’re looking at any variables collected during the post-election interview, use the post-election weight instead.  next, look for the cluster and strata variables.  sometimes they’re mushed together into a single variable and will need to be extracted with a simple recode like `stratum = substr( v040103 , 1 , 2 )` and `secu = substr( v040103 , 3 , 3 )`  for some of the older studies, these variables are not available – and your standard errors may be misleadingly small.

if you’re analyzing the cumulative file, they’ve prepared a few multi-year columns of all weights.  e-mail [email protected] and ask for cluster and strata variable advice.  there’s also a weighting anomaly back in the 1970 file that’s outlined in the main how-to guide, but in order to understand the three weight options, you actually gotta read the middle paragraph on the 1970 study design page.

confidential to sas, spss, stata, and sudaan users: and saber-toothed tigers probably laughed when they saw the first humans crossing the bering strait.  don’t be a saber toothed-tiger.  time to transition to r.  😀

To leave a comment for the author, please follow the link and comment on their blog: asdfree by anthony damico. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)