Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The new essurvey 1.0.3 is here! This release is mainly about downloading weight data from the European Social Survey (ESS), which has been on the works since 2017! As usual, you can install from CRAN or Github with:

# From CRAN
install.packages("essurvey")

# or development version from Github
devtools::install_github("ropensci/essurvey")

library(essurvey)
set_email("[email protected]")

Remember to set your registered email with set_email to download ESS data. This is as easy as running set_email("[email protected]"), with your email. The package now has two main functions to download weight data (called SDDF by the ESS): show_sddf_cntrounds and import_sddf_country. The first one returns the available weight rounds for a specific country. For example, for which rounds does Italy have weight data?

ita_rnds <- show_sddf_cntrounds("Italy")

ita_rnds
## [1] 6 8

show_sddf_cntrounds("Germany")
## [1] 1 2 3 4 5 6 7 8

For some rounds, some countries used complete random sampling, so they didn’t need any weight data for correct estimation. Italy did not use a random sample for round 8 so let’s focus on that wave for the example. To actually download this round, we use import_sddf_country:

# Download weight data
ita_dt <- import_sddf_country("Italy", 8)

ita_dt
## # A tibble: 2,626 x 10
##    name  essround edition proddate cntry  idno   psu domain stratum    prob
##    <chr>    <dbl> <chr>   <chr>    <chr> <dbl> <dbl>  <dbl>   <dbl>   <dbl>
##  1 ESS8…        8 1.2     11.02.2… IT        1 11029      2     658 1.01e-4
##  2 ESS8…        8 1.2     11.02.2… IT        2 11170      2     665 1.11e-4
##  3 ESS8…        8 1.2     11.02.2… IT        4 11127      2     660 1.03e-4
##  4 ESS8…        8 1.2     11.02.2… IT        5 10771      2     671 1.04e-4
##  5 ESS8…        8 1.2     11.02.2… IT        6 11148      2     666 1.06e-4
##  6 ESS8…        8 1.2     11.02.2… IT        9 11163      1     667 1.05e-4
##  7 ESS8…        8 1.2     11.02.2… IT       14 11183      1     657 1.06e-4
##  8 ESS8…        8 1.2     11.02.2… IT       15 11184      2     661 9.97e-5
##  9 ESS8…        8 1.2     11.02.2… IT       16 10928      2     652 1.01e-4
## 10 ESS8…        8 1.2     11.02.2… IT       22 11171      2     664 9.97e-5
## # … with 2,616 more rows

Notice that the weight data has an idno column. This column can be used to match each respondent from each country to the main ESS data. This means that you can now actually do proper weighted analysis using the ESS data on the fly! How would we match the data for Italy, for example?

library(dplyr)

ita_main <- import_country("Italy", 8)

And then merge it with the weight data:

# Let's keep only the important weight columns
ita_dt <- ita_dt %>% select(idno, psu, domain, stratum, prob)

# Merged main data and weight data
complete_data <- inner_join(ita_main, ita_dt, by = "idno")
## Warning: Column idno has different attributes on LHS and RHS of join
# There we have the matched data
complete_data %>%
select(essround, idno, cntry, psu, stratum, prob)
## # A tibble: 2,626 x 6
##    essround  idno cntry   psu stratum      prob
##       <dbl> <dbl> <chr> <dbl>   <dbl>     <dbl>
##  1        8     1 IT    11029     658 0.000101
##  2        8     2 IT    11170     665 0.000111
##  3        8     4 IT    11127     660 0.000103
##  4        8     5 IT    10771     671 0.000104
##  5        8     6 IT    11148     666 0.000106
##  6        8     9 IT    11163     667 0.000105
##  7        8    14 IT    11183     657 0.000106
##  8        8    15 IT    11184     661 0.0000997
##  9        8    16 IT    10928     652 0.000101
## 10        8    22 IT    11171     664 0.0000997
## # … with 2,616 more rows

There we have the matched data! This can be easily piped to the survey package and perform properly weighted analysis of the ESS data. In fact, an official ESS package for analyzing data is something we’re thinking of developing to making analyzing ESS data very easy.

Weight data (or SDDF data) is a bit tricky because not all country/rounds data have the same extension (some have SPSS, some have Stata, etc..) nor the same format (number of columns, name of columns, etc..). We would appreciate if you can submit any errors you find on Github and we’ll try taking care of them as soon as possible.

Special thanks to phnk, djhurio and Stefan Zins for helping out to push this.

Enjoy this new release!