I love polls. All kinds of polls, but especially political polls. I think I love them because I like politics and I also like to find out what’s going on in people’s heads, which is something that survey data allows one to do.
So I was thrilled to find Anthony Joseph Damico’s Analyze Survey Data for Free website. Specifically, I was interested in working with data from the American National Election Studies (ANES), which is a complex sample survey that collects responses on political belief and behavior from eligible voters in the U.S. Administered by Stanford University and the University of Michigan, and funded by the National Science Foundation, ANES is designed to generalize to all eligible voters in the U.S., so results give us a statistically sound view of what voters really think.
The data can be analyzed using either the
srvyr package, and can be downloaded via Damico’s
lodown package. The
srvyr package is nice, as it’s akin to
dplyr in syntax, but it’s limited in what it can do, so I will use both packages.
The first thing to do is get the data and construct the complex sample survey design.
library(lodown) library(survey) library(srvyr) # examine all available ANES microdata files anes_cat <- get_catalog( "anes" , output_dir = file.path( path.expand( "~" ) , "ANES" ) , your_email = "[email protected]" ) # 2016 only anes_cat <- subset( anes_cat , directory == "2016 Time Series Study" ) # download the microdata to your local computer anes_cat <- lodown( "anes" , anes_cat , your_email = "[email protected]" ) # Construct a complex sample survey design anes_df <- readRDS( file.path( path.expand( "~" ) , "ANES" , "2016 Time Series Study/anes_timeseries_2016_.rds" ) ) anes_design <- svydesign( ~v160202 , strata = ~v160201 , data = anes_df , weights = ~v160102 , nest = TRUE )
Now that I have the data, I’m going to recode some of the variables so that their titles are more descriptive. I was able to do this using the study’s codebook, which you can find here.
anes_design <- update( anes_design , one = 1 , supreme_court_score = ifelse( v162102 %in% 0:100 , v162102 , NA ) , muslims_score = ifelse( v162106 %in% 0:100 , v162106 , NA ) , police_score = ifelse( v162110 %in% 0:100 , v162110 , NA ) , blm_score = ifelse( v162113 %in% 0:100 , v162113 , NA ) , party_id = factor( v161158x , levels = 1:7 , labels = c( 'strong democrat' , 'not very strong democrat' , 'independent democrat' , 'independent', "independent republican", "not very strong republican", "strong republican") ) , rich_buy_elections = factor( v162220 , levels = 1:5 , labels = c( 'rich buy elections - all of the time' , 'rich buy elections - most of the time' , 'rich buy elections - about half the time' , 'rich buy elections - some of the time' , 'rich buy elections - never' ) ), bible_wordofgod = factor( v161243 , levels = 1:3 , labels = c( 'bible is word of god, to be taken literally' , 'bible is word of god but not everything to be taken literally' , 'bible is written by men and is not the word of god' ) ) , )
Now I want to plot a couple of the distributions, just to get some sense of the responses. The output of the
survey package is a survey design object, so
ggplot2, which is my preference most of the time, won’t work for this. But the
survey package includes some functions for basic charting.
svyhist(~v161243, design = anes_design, main = "Is the Bible the literal word of God", xlim=c(0, 3), ylim=c(0, 0.5), xlab = "", labels = c("bible to be taken literally", "bible not taken literally", "bible the work of men"))
svyhist(~v162220, design = anes_design, main = "Do the rich buy elections?", xlim=c(0, 5), ylim=c(0, 0.5), xlab = "", labels = c("all the time", "most of the time", "about half the time", "some of the time", "never"))
The score variables that I created above offer sentiment indicators based on a 0-100 thermometer that respondents use, where 100 is highly positive, 0 is highly negative, and 50 is neutral. I’m specifically interested in seeing how respondents from different categories assess, for example, the Supreme Court. I will use the
srvyr package, but in order to analyze the data with the
srvyr package, we need to get it into the proper format.
anes_srvyr_design <- as_survey( anes_design ) # Calculate the mean (average) of a linear variable, overall and by groups: anes_srvyr_design %>% summarize( mean = survey_mean( supreme_court_score , na.rm = TRUE ) )
# A tibble: 1 x 2 mean mean_se
1 58.3 0.389
anes_srvyr_design %>% group_by( party_id ) %>% summarize( mean = survey_mean( supreme_court_score , na.rm = TRUE ) )
# A tibble: 7 x 3 party_id mean mean_se
1 strong democrat 60.6 0.786 2 not very strong democrat 60.0 1.09 3 independent democrat 58.9 1.39 4 independent 53.7 1.24 5 independent republican 58.1 1.03 6 not very strong republican 57.9 1.07 7 strong republican 57.8 1.19
So with respect to the supreme court, people with differing ideologies generally view it the same. But what about more hot-button issues?
anes_srvyr_design %>% summarize( mean = survey_mean( muslims_score , na.rm = TRUE ) )
# A tibble: 1 x 2 mean mean_se
1 54.4 0.656
anes_srvyr_design %>% group_by( party_id ) %>% summarize( mean = survey_mean( muslims_score , na.rm = TRUE ) )
# A tibble: 7 x 3 party_id mean mean_se
1 strong democrat 67.1 1.04 2 not very strong democrat 58.8 1.52 3 independent democrat 63.5 1.44 4 independent 50.9 1.42 5 independent republican 49.4 1.51 6 not very strong republican 45.0 1.59 7 strong republican 40.9 1.34
So there is a significant difference in the survey data in how people with varying ideologies view Muslims, with people on the left viewing them much more favorably than those on the right. Now I want to visualize some of the scores, and since we’re using the
srvyr package now, we can use
ggplot2 for these.
police <- anes_srvyr_design %>% group_by(party_id) %>% summarize(mean = survey_mean(police_score, na.rm = TRUE)) ggplot(police, aes(party_id, mean, fill=party_id)) + geom_bar(stat = "identity") + ylim(0, 100) + xlab("") + scale_fill_brewer(palette = "Set3") + theme(legend.position = "none", axis.text.x = element_text(angle = -30, hjust = 0, vjust = 1))
Given that 50 indicates a neutral sentiment, it looks like those on the left and right all have a generally favorable view of the police. Let’s see if that same phenomenon is true for Black Lives Matter.
blm <- anes_srvyr_design %>% group_by(party_id) %>% summarize(mean = survey_mean(blm_score, na.rm = TRUE)) ggplot(blm, aes(party_id, mean, fill=party_id)) + geom_bar(stat = "identity") + ylim(0, 100) + xlab("") + scale_fill_brewer(palette = "Set3") + theme(legend.position = "none", axis.text.x = element_text(angle = -30, hjust = 0, vjust = 1))
There are pretty substantial differences in this survey data in how people from varying ideologies view the Black Lives Matter movement. Those on the left have a fairly positive view of the movement, while those on the right have a decidedly negative view.
srvyr package actually does its work on the back of the
survey package. It doesn’t have all the functionality of the
survey package, but it it is preferable to it for me when I want to visualize basic descriptive statistics with either
ggplot2 or another visualization package.