by Joseph Rickert
Data Science is all about getting access to interesting data, and it is really nice when some kind soul not only points out an interesting data set but also makes it easy for you to access it. Below is a list of 17 R packages that appeared on CRAN between May 1st and August 8th that, in one way or another, provide access to publicly available data.
blscrapeR: Provides an API wrapper for Bureau of Labor Statistics data sets. There is a vignette showing how to access inflation and price data, one for accessing Wages and Benefits data, and one for mapping BLS data.
dataone: The dataone R package enables R scripts to search, download and upload science data and metadata from/to the DataONE Federation. The website describes DataOne as “a community driven project providing access to data across multiple member repositories, supporting enhanced search and discovery of Earth and environmental data”. The package comes with several vignettes including this overview.
eechidna: Provides the data from the 2013 Australian Federal Election and tools to analyze it. There are several nicely done vignettes. The following plot which shows election results by polling place comes from the vignette on plotting polling stations.
getHFdata: Provides functions to downloads and aggregate high frequency trading data for Brazilian instruments directly from the Bovespa ftp site. There is a vignette to get you started. The following plot showing unemployment data by state comes from the vignette on Census data.
pewdata: Provides for reproducible, programmatic retrieval of survey data sets from the Pew Research Center. The vignette shows how to setup and use the package. Look here for an interesting poll about what Americans know about science.
For more packages that provide APIs to data sets have a look at the CRAN Task View on Web Technologies and Services. For a list of interesting data sets out there in the wild see the MRAN Data Sources page.
[Update: added the dataRetrieval package, at the suggestion of Laura DeCicco.]
Editor's note: This is Joe's last post to Revolutions as a member of the Microsoft team: he is heading on for further adventures in the world of R. We want to thank Joe for his many contributions to the blog over the past 6 years, and please join us in wishing him well!