Site icon R-bloggers

analyze the general social survey (gss) with r

[This article was first published on asdfree by anthony damico, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
the general social survey (gss) has served as america’s mood ring since 1972.  data-driven social scientists can compare political beliefs by demography, look at attitude trends, make emile durkheim and max weber (pronounced durk-veber) proud.  in contrast to high-frequency tracking polls that capture newspaper headlines, the gss has sustained a (now biennial) set of questions over four decades.

most analysts start with the cumulative, cross-sectional file (interviews conducted 1972 – present).  given the sprawling nature of that cumulative data set, you’d better read the documentation and understand the eccentricities of each of the variables you want to use before you send anything off for peer-review.  for example, many of the five thousand variables include missing values due to split-sample questions.  not to say it’s bad data – it’s damn useful.  you try administering a survey that keeps relevant for almost half a century.  otherwise, leave it to the national opinion research center (norc) at the university of chicago.  ..and the national science foundation to foot the bill.

on the main gss page, norc offers two online query tools – nesstar and sda – meaning you can point-and-click your way to some basic statistics.  the nesstar system smells like a fixer-upper, but berkeley’s sda (survey documentation and analysis) site offers a great way to confirm that you’re broadly analyzing the data correctly before you start writing r code to laser-focus on your research question.

the general social survey only gets asked of noninstitutional adults, because everyone already knows what kids’ political beliefs are: more candy, no homework.  this new github repository contains two scripts:

1972-2012 cumulative cross-sectional – analysis examples.R

replicate berkeley sda.R


click here to view these two scripts



for more detail about the general social survey (gss), visit:

notes:

berkeley’s sda website currently hosts release #1 of the 1972-2012 cross-sectional gss file, which is why the replication code above won’t match their posted quick tables exactly.  i kept bugging them until they ran the 1972-2010 release #2 data set through their same code, available in my github repository.  those numbers match.  squeaky wheel, baby.


confidential to sas, spss, stata, and sudaan users: why are you still dialing up to the internet after we’ve discovered fiber optics?  time to transition to r.  😀

To leave a comment for the author, please follow the link and comment on their blog: asdfree by anthony damico.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.