Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Last fall, my co-authors Stephanie Zimmer, Rebecca Powell, and I released Exploring Complex Survey Data Analysis Using R. Earlier this August, we had the joy of teaching a workshop based on the book at useR! 2025.
One of our favorite examples in both the book and workshop comes from the American National Election Studies (ANES), a long-running project from Stanford University and the University of Michigan supported by the National Science Foundation. Since 1948, ANES has fielded surveys almost every two years, making it one of the richest resources for studying public opinion and voting behavior in U.S. elections. The data cover everything from party affiliation to trust in government, and are carefully designed to represent the voting population.
Just a couple of days after returning from useR!, I was thrilled to see that the brand-new ANES 2024 dataset had been released!
The full release of the ANES 2024 Time Series #Data is now available. More details here: electionstudies.org/anes-announc…
— American National Election Studies (@electionstudies.bsky.social) August 12, 2025 at 12:24 PM
[image or embed]
Survey data like ANES are not as straightforward as a simple random sample. Some groups are intentionally oversampled, others are harder to reach, and not every respondent has the same probability of being included. To correct for this, surveys provide “weights”, or numbers that tell us how much each response should “count” when estimating results for the whole population.
If we ignore these weights and just calculate raw percentages, we risk drawing the wrong conclusions. That’s where the {survey} and {srvyr} packages come in: they handle weights, strata, and clusters behind the scenes so our estimates properly reflect the survey design.
So, let’s walk through how you can get started analyzing this dataset in R with the {survey} and {srvyr} packages. For more depth, the Getting Started chapter of our book is a handy companion.
< section id="download-the-data" class="level2">Download the data
Head to the Election Studies website and grab the CSV version of the 2024 Time Series study. The download comes as a Zip file that includes both the dataset and documentation.
< section id="load-packages-and-data" class="level2">Load packages and data
We’ll use the tidyverse for wrangling, and {survey} plus {srvyr} for analysis:
library(tidyverse) library(survey) library(srvyr) anes_2024 <- read_csv("anes_timeseries_2024_csv_20250808.csv")
Review the documentation
Reviewing the survey documentation is a critical part of survey analysis. The ANES materials include details about the sample, weighting, and codebook.
< section id="sample" class="level3">Sample
The 2024 sample was designed to represent U.S. citizens age 18 or older. Respondents were recruited from both in-person and web samples, with careful notes on eligibility and exclusions.
< section id="codebook" class="level3">Codebook
The codebook decodes variable names into something human-readable.
Here are two variables we’ll use:
V240002b
: MODE OF INTERVIEW: POST-ELECTION: This variable captures the mode in which the respondent completed the majority of the post-election survey.- -7. Insufficient parital, interview deleted
- -6. No post interview
- Face-to-face (FTF)
- Internet (WEB)
- Paper (PAPI)
- Phone
- Video
V241229
: PRE: HOW OFTEN TRUST GOVERNMENT IN WASHINGTON TO DO WHAT IS RIGHT [REVISED]:- -9. Refused
- -8. Don’t know
- -1. Inapplicable
- Always
- Most of the time
- About half the time
- Some of the time
- Never
Let’s recode V241229
for easier analysis:
anes_2024_code <- anes_2024 |> mutate( TrustGovernment = factor( case_when( V241229 == 1 ~ "Always", V241229 == 2 ~ "Most of the time", V241229 == 3 ~ "About half the time", V241229 == 4 ~ "Some of the time", V241229 == 5 ~ "Never", .default = NA ), levels = c("Always", "Most of the time", "About half the time", "Some of the time", "Never") ) )
See more on codebooks.
< section id="create-a-design-object" class="level2">Create a design object
The design object is the backbone of survey analysis. Survey data use weights, strata, and clusters to reflect the population. That means we need a survey design object before running any analyses, and the design object tells R which weights to use, how respondents were sampled, and what adjustments are needed. Without it, any estimates we calculate could be misleading.
To build the design object, we need three key ingredients:
- Weights: to make sure results reflect the population.
- Strata: groups used in sampling to ensure coverage.
- Clusters (PSUs): primary sampling units, like households or addresses.
Finding the right weight
The ANES documentation lists which weight variables go with which sample:
Since we’re looking at the post-election fresh in-person sample, we’ll use weight variable V240101b
.
One important note: ANES weights add up to the sample size, not the population size. Let’s confirm:
anes_2024_code |> filter(V240101b > 0) |> summarize(sum = sum(V240101b))
# A tibble: 1 × 1 sum <dbl> 1 925
This matches the 925
respondents shown in the documentation’s sample table.
Adjusting for the population
If we want to make inferences about the entire U.S. population, we need to adjust the weights. The target population for this sample is about 232.5 million U.S. citizens aged 18 or older, but that’s just a rough number. Per the documentation, for something more precise, we can pull population counts from the 2023 American Community Survey 1-year estimates.
Thank you, Stephanie, for writing up the code for the population estimates!
library(tidycensus) varlist_2023 <- load_variables(2023, "acs1") citizen_pop_18_plus <- get_acs( geography = "state", variable = "S2901_C01_001", year = 2023, survey = "acs1" ) pums_vars_2023 <- pums_variables |> as_tibble() |> filter(year == 2023, survey == "acs1") pums_vars_2023 |> filter(var_code %in% c("AGEP", "TYPEHUGQ", "CIT")) |> select(var_code, var_label, val_min, val_max, val_label) |> print(n = 50) # Find those who are 18+ and in group quarters get_citizen_pop_18_plus_gq <- function(state) { get_pums( variables = c("AGEP", "TYPEHUGQ"), state = state, year = 2023, survey = "acs1", variables_filter = list( TYPEHUGQ = 2:3 , CIT = 1:4, AGEP = (18:200) ) ) |> summarize(estimate = sum(PWGTP), .by = STATE) } citizen_pop_18_plus_gq_l <- c(state.abb, "DC") |> map(get_citizen_pop_18_plus_gq) citizen_pop_18_plus_gq_df <- citizen_pop_18_plus_gq_l |> list_rbind() |> rename(estimate_gq = estimate) state_pops <- citizen_pop_18_plus |> select(GEOID, NAME, estimate_tot = estimate) |> full_join(citizen_pop_18_plus_gq_df, by = c("GEOID" = "STATE")) |> filter(NAME != "Puerto Rico") |> mutate(estimate_scope = estimate_tot - estimate_gq) # Total without AK and HI targetpop <- state_pops |> filter(!GEOID %in% c("02", "15")) |> summarize(TargetPop = sum(estimate_scope)) # Total with AK and HI state_pops |> summarize(TargetPop = sum(estimate_scope))
The estimated target population (targetpop
) for 2024 is 232,449,541. Now we can rescale the ANES weights so they add up to the population size:
anes_adjwgt <- anes_2024_code |> mutate(Weight = V240101b / sum(V240101b, na.rm = TRUE) * targetpop)
Strata and clusters
The documentation also tells us which strata and PSU (cluster) variables pair with each weight.
For V240101b, those are:
- Strata:
V240101d
- PSU:
V240101c
More on sampling design can be found in this chapter.
< section id="putting-it-all-together" class="level3">Putting it all together
Finally, we can build the design object. We’ll also filter to respondents who actually completed the in-person post-election interview (V240002b == 1
):
options(survey.lonely.psu = "adjust") anes_des <- anes_adjwgt |> filter(V240002b == 1) |> as_survey_design( weights = Weight, strata = V240101d, ids = V240101c, nest = TRUE ) anes_des
And there we go! anes_des
is our fully specified survey design object. From here on, we use it so every estimate we calculate properly reflects the survey’s design.
Analyze the data
Now for the fun part! Let’s see how people answered the Trust Government question:
( trustgov <- anes_des |> drop_na(TrustGovernment) |> group_by(TrustGovernment) |> summarize(p = survey_prop(vartype = "ci")) |> mutate(Variable = "V241229") |> rename(Answer = TrustGovernment) |> select(Variable, everything()) )
# A tibble: 5 × 5 Variable Answer p p_low p_upp <chr> <fct> <dbl> <dbl> <dbl> 1 V241229 Always 0.00774 0.00288 0.0206 2 V241229 Most of the time 0.174 0.122 0.242 3 V241229 About half the time 0.283 0.225 0.351 4 V241229 Some of the time 0.402 0.341 0.465 5 V241229 Never 0.133 0.0857 0.201
And a quick plot:
The results show just how varied public trust is, with only a small fraction of respondents saying they always trust the government.
We can also look at the data by subgroups:
anes_des <- anes_des |> mutate(Gender = factor( case_when(V241550 == 1 ~ "Male", V241550 == 2 ~ "Female", .default = NA), levels = c("Male", "Female") )) ( trustgov_gender <- anes_des |> drop_na(Gender, TrustGovernment) |> group_by(Gender, TrustGovernment) |> summarize(p = survey_prop(vartype = "ci")) |> mutate(Variable = "V241229") |> rename(Answer = TrustGovernment) |> select(Variable, everything()) )
# A tibble: 10 × 6 # Groups: Gender [2] Variable Gender Answer p p_low p_upp <chr> <fct> <fct> <dbl> <dbl> <dbl> 1 V241229 Male Always 0.0123 0.00379 0.0393 2 V241229 Male Most of the time 0.194 0.109 0.322 3 V241229 Male About half the time 0.249 0.178 0.336 4 V241229 Male Some of the time 0.423 0.321 0.533 5 V241229 Male Never 0.121 0.0659 0.212 6 V241229 Female Always 0.00336 0.000744 0.0150 7 V241229 Female Most of the time 0.153 0.0984 0.230 8 V241229 Female About half the time 0.320 0.208 0.457 9 V241229 Female Some of the time 0.381 0.285 0.487 10 V241229 Female Never 0.142 0.0736 0.257
Wrap up
The ANES 2024 release is packed with insights, and what we’ve covered here is just the beginning. With hundreds of variables across attitudes, demographics, and behaviors, you’ll definitely find questions worth exploring.
If you try out the dataset, I’d love to hear what you discover. Feel free to share your own analyses or visualizations!
Find me on Bluesky: @ivelasq3
Check out more resources:
- Exploring Complex Survey Data Analysis Using R free online book
- {survey} package documentation
- {srvyr} package documentation
- {srvyr} cheatsheet for quick function references
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.