gssr is now two packages: gssr and gssrdoc

[This article was first published on R on kieranhealy.org, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Summary

My gssr package is now two packages: gssr and gssrdoc. They’re also available as binary packages via R-Universe which means they will install much faster.

The GSS is a big survey with a big codebook. Distributing it as an R package poses a few challenges. It’s too big for CRAN, of course, but that’s fine because CRAN is not a repository for datasets in any case. For some time, my gssr package has bundled the main data file, the panel datasets, and functions for getting the file for a particular year directly from NORC. Recently, I started integrating the codebook—or at least, summaries of every variable in the 1972-2022 data file—into the package. It’s a handy feature. It lets you look up GSS variables as if they were R functions:

fefam in R

Looking up a GSS variable

The main downside to doing this is that it makes a large package even larger. In addition, it takes a fair amount of time to install from source because more than 6,500 variables have to be documented during the installation. Providing binary packages would be much better. R OpenSci’s R-Universe provides a package-building service that rests on a bunch of GitHub Actions. But the resource constraints of GitHub’s runners meant that building a source package would fail on Ubuntu (specifically), and this meant that I couldn’t use it. To get around this I have split the package in two. There’s now gssr, which has the datasets (and the ability to fetch yearly datasets) exactly as before, and gssrdoc, which provides the integrated help. They are fully independent of one another. If you install both, you get exactly what gssr used to give you by itself. I think splitting them like this is worth it just because R-Universe can build package binaries of each now, and this means installation is much faster and you can use install.packages(). To install both, do:

r
1
2
3
4
5
6
7
# Install 'gssr' from 'ropensci' universe
install.packages('gssr', repos =
  c('https://kjhealy.r-universe.dev', 'https://cloud.r-project.org'))

# Also recommended: install 'gssrdoc' as well
install.packages('gssrdoc', repos =
  c('https://kjhealy.r-universe.dev', 'https://cloud.r-project.org'))

You can of course permanently add my or any other R-Universe repo to the default list of repos that install.packages() will search by using options() either in a project or in your .Rprofile. The R-Universe help repo has some additional details.

Note that if you install both packages you can just load library(gssr), but if you don’t want to load gssrdoc you can still query it at the console with e.g. ??polviews or ?gssrdoc::fefam.

To leave a comment for the author, please follow the link and comment on their blog: R on kieranhealy.org.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)