analyze the world values survey (wvs) with r

September 2, 2014
By

(This article was first published on asdfree by anthony damico, and kindly contributed to R-bloggers)

a global barometer of public opinion, the world values survey (wvs) clocks in as your best source of cross-cultural moods, attitudes.   you might find its most famous product sweepingly general, but who among us has never ever swept a smidgen of nuance under the rug?  if you want to explore complex international patterns of belief, now's your chance.

though their scientific advisory committee (sac) sets the ground rules and dictates the core content, individual national samples should be viewed as something of a confederacy of surveys.  carefully read the technical reports for any nations you dare to compare.  the homepage struck me as more personality-driven than that of other public use data sets.  but, really, who am i to judge?  if you care about religious fervency, gender equality, democracy, or even being grossly nationally happy, then the world values survey is the best source there ever will be.  this github repository contains two scripts:

download all microdata.R
  • impersonate a thirteen year old ukrainian boy, convince the archive that a human's doing the downloading
  • for-loop through every wave, every study, every nation
  • save each file to your local hard disk according to an easy-to-peruse structure

analysis examples.R
  • load a country-specific data set
  • construct a fake survey design object.  statistics and coefficients will be calculated correctly, but standard errors and confidence intervals generated off of this complex sample design should be ignored.  read the user note within the script for more four one one
  • examine the bejesus out of that survey design object, calculating every descriptive statistic possible


click here to view these two scripts


for more detail about the world values survey (wvs), visit:
  • geocities and myspace had a baby, and named it worldvaluessurvey.org.  i half expected a midi track to start up
  • wikipedia for much of the same content, but structured in a format you know and love


notes:

the administrators have neglected to produce microdata files that permit users to calculate confidence intervals using either of the most common survey analysis methods.  in other words, these data will give you a best guess, but you'll be in the dark about whether that guess is any good.  since there are no correct confidence intervals to match, i have not provided my usual replication script.  if you look in the "results" pdf file (not the "sample design" or "methodology" pdf files) for any nation, you'll find an "estimated error" somewhere around the second page.  this is a crude, dataset-wide measure of variance, but it's your only option to use as the standard error for any statistical testing.  this is a one-size-fits-all substitute for other more precise sampling error calculations like taylor-series linearization or replicate weightingyou could politely! request that they include clustering and strata variables on both future and historical files.  because awesome data can always get more awesome.


confidential to sas, spss, stata, and sudaan users: would you buy an imitation rolex if the real thing were free?  well look at your wrist because it's time to transition to r.  :D

To leave a comment for the author, please follow the link and comment on his blog: asdfree by anthony damico.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.