ANES 2024 is Out! How to Analyze the Data with R

Isabella Velásquez

24 hours ago

[This article was first published on R Works, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Last fall, my co-authors Stephanie Zimmer, Rebecca Powell, and I released Exploring Complex Survey Data Analysis Using R. Earlier this August, we had the joy of teaching a workshop based on the book at useR! 2025.

One of our favorite examples in both the book and workshop comes from the American National Election Studies (ANES), a long-running project from Stanford University and the University of Michigan supported by the National Science Foundation. Since 1948, ANES has fielded surveys almost every two years, making it one of the richest resources for studying public opinion and voting behavior in U.S. elections. The data cover everything from party affiliation to trust in government, and are carefully designed to represent the voting population.

Just a couple of days after returning from useR!, I was thrilled to see that the brand-new ANES 2024 dataset had been released!

The full release of the ANES 2024 Time Series #Data is now available. More details here: electionstudies.org/anes-announc…

[image or embed]
— American National Election Studies (@electionstudies.bsky.social) August 12, 2025 at 12:24 PM

Survey data like ANES are not as straightforward as a simple random sample. Some groups are intentionally oversampled, others are harder to reach, and not every respondent has the same probability of being included. To correct for this, surveys provide “weights”, or numbers that tell us how much each response should “count” when estimating results for the whole population.

If we ignore these weights and just calculate raw percentages, we risk drawing the wrong conclusions. That’s where the {survey} and {srvyr} packages come in: they handle weights, strata, and clusters behind the scenes so our estimates properly reflect the survey design.

So, let’s walk through how you can get started analyzing this dataset in R with the {survey} and {srvyr} packages. For more depth, the Getting Started chapter of our book is a handy companion.

< section id="download-the-data" class="level2">

Download the data

Head to the Election Studies website and grab the CSV version of the 2024 Time Series study. The download comes as a Zip file that includes both the dataset and documentation.

< section id="load-packages-and-data" class="level2">

Load packages and data

We’ll use the tidyverse for wrangling, and {survey} plus {srvyr} for analysis:

library(tidyverse)
library(survey)
library(srvyr)

anes_2024 <-
  read_csv("anes_timeseries_2024_csv_20250808.csv")

< section id="review-the-documentation" class="level2">

Review the documentation

Reviewing the survey documentation is a critical part of survey analysis. The ANES materials include details about the sample, weighting, and codebook.

< section id="sample" class="level3">

Sample

The 2024 sample was designed to represent U.S. citizens age 18 or older. Respondents were recruited from both in-person and web samples, with careful notes on eligibility and exclusions.

< section id="codebook" class="level3">

Codebook

The codebook decodes variable names into something human-readable.

Here are two variables we’ll use:

V240002b: MODE OF INTERVIEW: POST-ELECTION: This variable captures the mode in which the respondent completed the majority of the post-election survey.
- -7. Insuﬃcient parital, interview deleted
- -6. No post interview
- 1. Face-to-face (FTF)
- 1. Internet (WEB)
- 1. Paper (PAPI)
- 1. Phone
- 1. Video
V241229: PRE: HOW OFTEN TRUST GOVERNMENT IN WASHINGTON TO DO WHAT IS RIGHT [REVISED]:
- -9. Refused
- -8. Don’t know
- -1. Inapplicable
- 1. Always
- 1. Most of the time
- 1. About half the time
- 1. Some of the time
- 1. Never

Let’s recode V241229 for easier analysis:

anes_2024_code <-
  anes_2024 |> 
  mutate(
    TrustGovernment = factor(
      case_when(
        V241229 == 1 ~ "Always",
        V241229 == 2 ~ "Most of the time",
        V241229 == 3 ~ "About half the time",
        V241229 == 4 ~ "Some of the time",
        V241229 == 5 ~ "Never",
        .default = NA
      ),
      levels = c("Always", "Most of the time", "About half the time", "Some of the time", "Never")
    )
  )

See more on codebooks.

< section id="create-a-design-object" class="level2">

Create a design object

The design object is the backbone of survey analysis. Survey data use weights, strata, and clusters to reflect the population. That means we need a survey design object before running any analyses, and the design object tells R which weights to use, how respondents were sampled, and what adjustments are needed. Without it, any estimates we calculate could be misleading.

To build the design object, we need three key ingredients:

Weights: to make sure results reflect the population.
Strata: groups used in sampling to ensure coverage.
Clusters (PSUs): primary sampling units, like households or addresses.

< section id="finding-the-right-weight" class="level3">

Finding the right weight

The ANES documentation lists which weight variables go with which sample:

Since we’re looking at the post-election fresh in-person sample, we’ll use weight variable V240101b.

One important note: ANES weights add up to the sample size, not the population size. Let’s confirm:

anes_2024_code |> filter(V240101b > 0) |> summarize(sum = sum(V240101b))

# A tibble: 1 × 1
    sum
  <dbl>
1   925

This matches the 925 respondents shown in the documentation’s sample table.

< section id="adjusting-for-the-population" class="level3">

Adjusting for the population

If we want to make inferences about the entire U.S. population, we need to adjust the weights. The target population for this sample is about 232.5 million U.S. citizens aged 18 or older, but that’s just a rough number. Per the documentation, for something more precise, we can pull population counts from the 2023 American Community Survey 1-year estimates.

Thank you, Stephanie, for writing up the code for the population estimates!

library(tidycensus)

varlist_2023 <- load_variables(2023, "acs1")

citizen_pop_18_plus <-
  get_acs(
    geography = "state",
    variable = "S2901_C01_001",
    year = 2023,
    survey = "acs1"
  )

pums_vars_2023 <- pums_variables |>
  as_tibble() |>
  filter(year == 2023, survey == "acs1")

pums_vars_2023 |>
  filter(var_code %in% c("AGEP", "TYPEHUGQ", "CIT")) |>
  select(var_code, var_label, val_min, val_max, val_label) |>
  print(n = 50)

# Find those who are 18+ and in group quarters
get_citizen_pop_18_plus_gq <- function(state) {
  get_pums(
    variables = c("AGEP", "TYPEHUGQ"),
    state = state,
    year = 2023,
    survey = "acs1",
    variables_filter = list(
      TYPEHUGQ = 2:3 ,
      CIT = 1:4,
      AGEP = (18:200)
    )
  ) |>
    summarize(estimate = sum(PWGTP), .by = STATE)
}

citizen_pop_18_plus_gq_l <- c(state.abb, "DC") |>
  map(get_citizen_pop_18_plus_gq)

citizen_pop_18_plus_gq_df <-
  citizen_pop_18_plus_gq_l |>
  list_rbind() |>
  rename(estimate_gq = estimate)

state_pops <- citizen_pop_18_plus |>
  select(GEOID, NAME, estimate_tot = estimate) |>
  full_join(citizen_pop_18_plus_gq_df, by = c("GEOID" = "STATE")) |>
  filter(NAME != "Puerto Rico") |>
  mutate(estimate_scope = estimate_tot - estimate_gq)

# Total without AK and HI
targetpop <-
  state_pops |>
  filter(!GEOID %in% c("02", "15")) |>
  summarize(TargetPop = sum(estimate_scope))

# Total with AK and HI
state_pops |>
  summarize(TargetPop = sum(estimate_scope))

The estimated target population (targetpop) for 2024 is 232,449,541. Now we can rescale the ANES weights so they add up to the population size:

anes_adjwgt <- anes_2024_code |>
  mutate(Weight = V240101b / sum(V240101b, na.rm = TRUE) * targetpop)

< section id="strata-and-clusters" class="level3">

Strata and clusters

The documentation also tells us which strata and PSU (cluster) variables pair with each weight.

For V240101b, those are:

Strata: V240101d
PSU: V240101c

More on sampling design can be found in this chapter.

< section id="putting-it-all-together" class="level3">

Putting it all together

Finally, we can build the design object. We’ll also filter to respondents who actually completed the in-person post-election interview (V240002b == 1):

options(survey.lonely.psu = "adjust")

anes_des <- anes_adjwgt |>
  filter(V240002b == 1) |>
  as_survey_design(
    weights = Weight,
    strata = V240101d,
    ids = V240101c,
    nest = TRUE
  )

anes_des

And there we go! anes_des is our fully specified survey design object. From here on, we use it so every estimate we calculate properly reflects the survey’s design.

< section id="analyze-the-data" class="level2 page-columns page-full">

Analyze the data

Now for the fun part! Let’s see how people answered the Trust Government question:

(
  trustgov <- anes_des |>
    drop_na(TrustGovernment) |>
    group_by(TrustGovernment) |>
    summarize(p = survey_prop(vartype = "ci")) |>
    mutate(Variable = "V241229") |>
    rename(Answer = TrustGovernment) |>
    select(Variable, everything())
)

# A tibble: 5 × 5
  Variable Answer                    p   p_low  p_upp
  <chr>    <fct>                 <dbl>   <dbl>  <dbl>
1 V241229  Always              0.00774 0.00288 0.0206
2 V241229  Most of the time    0.174   0.122   0.242 
3 V241229  About half the time 0.283   0.225   0.351 
4 V241229  Some of the time    0.402   0.341   0.465 
5 V241229  Never               0.133   0.0857  0.201

And a quick plot:

The results show just how varied public trust is, with only a small fraction of respondents saying they always trust the government.

We can also look at the data by subgroups:

anes_des <-
  anes_des |>
  mutate(Gender = factor(
    case_when(V241550 == 1 ~ "Male", V241550 == 2 ~ "Female", .default = NA),
    levels = c("Male", "Female")
  ))

(
  trustgov_gender <- anes_des |>
    drop_na(Gender, TrustGovernment) |>
    group_by(Gender, TrustGovernment) |>
    summarize(p = survey_prop(vartype = "ci")) |>
    mutate(Variable = "V241229") |>
    rename(Answer = TrustGovernment) |>
    select(Variable, everything())
)

# A tibble: 10 × 6
# Groups:   Gender [2]
   Variable Gender Answer                    p    p_low  p_upp
   <chr>    <fct>  <fct>                 <dbl>    <dbl>  <dbl>
 1 V241229  Male   Always              0.0123  0.00379  0.0393
 2 V241229  Male   Most of the time    0.194   0.109    0.322 
 3 V241229  Male   About half the time 0.249   0.178    0.336 
 4 V241229  Male   Some of the time    0.423   0.321    0.533 
 5 V241229  Male   Never               0.121   0.0659   0.212 
 6 V241229  Female Always              0.00336 0.000744 0.0150
 7 V241229  Female Most of the time    0.153   0.0984   0.230 
 8 V241229  Female About half the time 0.320   0.208    0.457 
 9 V241229  Female Some of the time    0.381   0.285    0.487 
10 V241229  Female Never               0.142   0.0736   0.257

< section id="wrap-up" class="level2">

Wrap up

The ANES 2024 release is packed with insights, and what we’ve covered here is just the beginning. With hundreds of variables across attitudes, demographics, and behaviors, you’ll definitely find questions worth exploring.

If you try out the dataset, I’d love to hear what you discover. Feel free to share your own analyses or visualizations!

Find me on Bluesky: @ivelasq3

Check out more resources:

Exploring Complex Survey Data Analysis Using R free online book
{survey} package documentation
{srvyr} package documentation
- {srvyr} cheatsheet for quick function references

To leave a comment for the author, please follow the link and comment on their blog: R Works.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.