Analyzing voter survey data with R

[This article was first published on my (mis)adventures in R programming, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I love polls. All kinds of polls, but especially political polls. I think I love them because I like politics and I also like to find out what’s going on in people’s heads, which is something that survey data allows one to do.

So I was thrilled to find Anthony Joseph Damico’s Analyze Survey Data for Free website. Specifically, I was interested in working with data from the American National Election Studies (ANES), which is a complex sample survey that collects responses on political belief and behavior from eligible voters in the U.S. Administered by Stanford University and the University of Michigan, and funded by the National Science Foundation, ANES is designed to generalize to all eligible voters in the U.S., so results give us a statistically sound view of what voters really think.

The data can be analyzed using either the survey or srvyr package, and can be downloaded via Damico’s lodown package. The srvyr package is nice, as it’s akin to dplyr in syntax, but it’s limited in what it can do, so I will use both packages.

The first thing to do is get the data and construct the complex sample survey design.

library(lodown)
library(survey)
library(srvyr)

# examine all available ANES microdata files
anes_cat <-
        get_catalog( "anes" ,
                     output_dir = file.path( path.expand( "~" ) , "ANES" ) , 
                     your_email = "[email protected]" )

# 2016 only
anes_cat <- subset( anes_cat , directory == "2016 Time Series Study" )

# download the microdata to your local computer
anes_cat <- lodown( "anes" , anes_cat , 
                    your_email = "[email protected]" )

# Construct a complex sample survey design
anes_df <- 
        readRDS( 
                file.path( path.expand( "~" ) , "ANES" , 
                           "2016 Time Series Study/anes_timeseries_2016_.rds" )
        )

anes_design <-
        svydesign( 
                ~v160202 , 
                strata = ~v160201 , 
                data = anes_df , 
                weights = ~v160102 , 
                nest = TRUE 
        )

Now that I have the data, I’m going to recode some of the variables so that their titles are more descriptive. I was able to do this using the study’s codebook, which you can find here.

anes_design <- 
        update( anes_design , one = 1 ,
                
                supreme_court_score = ifelse( v162102 %in% 0:100 , v162102 , NA ) ,
                
                muslims_score = ifelse( v162106 %in% 0:100 , v162106 , NA ) ,
                
                police_score = ifelse( v162110 %in% 0:100 , v162110 , NA ) ,
                
                blm_score = ifelse( v162113 %in% 0:100 , v162113 , NA ) ,
                
               party_id =
                        factor( v161158x , levels = 1:7 , labels =
                                        c( 'strong democrat' , 'not very strong democrat' , 
                                           'independent democrat' , 'independent',
                                           "independent republican", "not very strong republican",
                                           "strong republican")
                        ) ,
                rich_buy_elections =
                        factor( v162220 , levels = 1:5 , labels =
                                        c( 'rich buy elections - all of the time' ,
                                           'rich buy elections - most of the time' ,
                                           'rich buy elections - about half the time' ,
                                           'rich buy elections - some of the time' ,
                                           'rich buy elections - never' )
                        ),
                bible_wordofgod =
                        factor( v161243 , levels = 1:3 , labels =
                                        c( 'bible is word of god, to be taken literally' , 
                                           'bible is word of god but not everything to be taken literally' , 
                                           'bible is written by men and is not the word of god' )
                        ) ,
                
        )

Now I want to plot a couple of the distributions, just to get some sense of the responses. The output of the survey package is a survey design object, so ggplot2, which is my preference most of the time, won’t work for this. But the survey package includes some functions for basic charting.

svyhist(~v161243, design = anes_design, main = "Is the Bible the literal word of God", 
        xlim=c(0, 3), ylim=c(0, 0.5), xlab = "", 
        labels = c("bible to be taken literally", "bible not taken literally",
                   "bible the work of men"))

svyhist(~v162220, design = anes_design, main = "Do the rich buy elections?",
        xlim=c(0, 5), ylim=c(0, 0.5), xlab = "",
        labels = c("all the time", "most of the time", "about half the time",
                   "some of the time", "never"))

The score variables that I created above offer sentiment indicators based on a 0-100 thermometer that respondents use, where 100 is highly positive, 0 is highly negative, and 50 is neutral. I’m specifically interested in seeing how respondents from different categories assess, for example, the Supreme Court. I will use the srvyr package, but in order to analyze the data with the srvyr package, we need to get it into the proper format.

anes_srvyr_design <- as_survey( anes_design )

# Calculate the mean (average) of a linear variable, overall and by groups:
anes_srvyr_design %>%
        summarize( mean = survey_mean( supreme_court_score , na.rm = TRUE ) )

# A tibble: 1 x 2
   mean mean_se
  <dbl>   <dbl>
1  58.3   0.389

anes_srvyr_design %>%
        group_by( party_id ) %>%
        summarize( mean = survey_mean( supreme_court_score , na.rm = TRUE ) )

# A tibble: 7 x 3
  party_id                    mean mean_se
  <fct>                      <dbl>   <dbl>
1 strong democrat             60.6   0.786
2 not very strong democrat    60.0   1.09 
3 independent democrat        58.9   1.39 
4 independent                 53.7   1.24 
5 independent republican      58.1   1.03 
6 not very strong republican  57.9   1.07 
7 strong republican           57.8   1.19

So with respect to the supreme court, people with differing ideologies generally view it the same. But what about more hot-button issues?

anes_srvyr_design %>%
        summarize( mean = survey_mean(  muslims_score , na.rm = TRUE ) )

# A tibble: 1 x 2
   mean mean_se
  <dbl>   <dbl>
1  54.4   0.656

anes_srvyr_design %>%
        group_by( party_id ) %>%
        summarize( mean = survey_mean(  muslims_score , na.rm = TRUE ) )

# A tibble: 7 x 3
  party_id                    mean mean_se
  <fct>                      <dbl>   <dbl>
1 strong democrat             67.1    1.04
2 not very strong democrat    58.8    1.52
3 independent democrat        63.5    1.44
4 independent                 50.9    1.42
5 independent republican      49.4    1.51
6 not very strong republican  45.0    1.59
7 strong republican           40.9    1.34

So there is a significant difference in the survey data in how people with varying ideologies view Muslims, with people on the left viewing them much more favorably than those on the right. Now I want to visualize some of the scores, and since we’re using the srvyr package now, we can use ggplot2 for these.

police <- anes_srvyr_design %>%
        group_by(party_id) %>%
        summarize(mean = survey_mean(police_score, na.rm = TRUE))

ggplot(police, aes(party_id, mean, fill=party_id)) +
        geom_bar(stat = "identity") +
        ylim(0, 100) + xlab("") +
        scale_fill_brewer(palette = "Set3") +
        theme(legend.position = "none",
              axis.text.x = element_text(angle = -30, hjust = 0, vjust = 1))

Given that 50 indicates a neutral sentiment, it looks like those on the left and right all have a generally favorable view of the police. Let’s see if that same phenomenon is true for Black Lives Matter.

blm <- anes_srvyr_design %>%
        group_by(party_id) %>%
        summarize(mean = survey_mean(blm_score, na.rm = TRUE))

ggplot(blm, aes(party_id, mean, fill=party_id)) +
        geom_bar(stat = "identity") +
        ylim(0, 100) + xlab("") +
        scale_fill_brewer(palette = "Set3") +
        theme(legend.position = "none",
              axis.text.x = element_text(angle = -30, hjust = 0, vjust = 1))

There are pretty substantial differences in this survey data in how people from varying ideologies view the Black Lives Matter movement. Those on the left have a fairly positive view of the movement, while those on the right have a decidedly negative view.

The srvyr package actually does its work on the back of the survey package. It doesn’t have all the functionality of the survey package, but it it is preferable to it for me when I want to visualize basic descriptive statistics with either ggplot2 or another visualization package.

The post Analyzing voter survey data with R appeared first on my (mis)adventures in R programming.

To leave a comment for the author, please follow the link and comment on their blog: my (mis)adventures in R programming.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)