Some Data Packages

[This article was first published on R on kieranhealy.org, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

If you’re teaching statistics, data analysis, or data visualization with R this semester, especially in the social sciences, I’ve pulled together various bits of data into packages that I use in my own teaching. You might find them useful once you’re sick of Gapminder. They cover a variety of topics and range from single tables of data to whole longitudinal and panel surveys.

The cavax package contains a school-level table of rates of Personal Belief Exemptions (PBEs) in California kindergartens for the 2014-15 school year. At that time (the rules have since changed), a PBE allowed a child to enter kindergarten without having received the usual complement of vaccinations. Information on the school’s name, district, city, county, and type is included, along with the size of the kindergarten class.

The ukelection2019 package contains candidate-level vote data by constituency on the UK general election of 2019, scraped from the BBC’s election website.

The uscenpops package contains a table of birth counts for the United States by year-of-age and sex for every year from 1900 to 2018.

The nycdogs package is a fun dataset (actually three separate tibbles: licenses, bites, and zip codes) taken from New York City’s Open Data initiative, cleaned up and packaged for R. It’s useful for teaching dplyr, for drawing maps, and for seeing where dogs with particular names live.

The covdata package contains data on reported cases of and deaths from COVID-19 from from a variety of sources. Amongst other things, the package provides (1) National-level case and mortality data from the ECDC, U.S. state-level case and morality data from the CDC and the New York Times, patient-level data from the CDC’s public use dataset. (2) All-cause mortality and excess mortality data from the Human Mortality Database. (3) Mobility and activity data from Apple and Google. (4) Policy data from the CoronaNet Project.

The gssr package provides the complete General Social Survey cumulative data file (1972-2018) and Three Wave Panel data files in an R-friendly format, together with their codebooks.

All of these packages work well with the socviz package which supports my Data Visualization book with a collection of datasets and utility functions to help you draw good graphs in R and ggplot.

To leave a comment for the author, please follow the link and comment on their blog: R on kieranhealy.org.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)