R Packages for Data Access

[This article was first published on Revolutions, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

by Joseph Rickert

Data Science is all about getting access to interesting data, and it is really nice when some kind soul not only points out an interesting data set but also makes it easy for you to access it. Below is a list of 17 R packages that appeared on CRAN between May 1st and August 8th that, in one way or another, provide access to publicly available data. 

bigQueryR: Provides an interface to Google's BigQuery. The vignette shows how to use it.

blscrapeR: Provides an API wrapper for Bureau of Labor Statistics data sets. There is a vignette showing how to access inflation and price data, one for accessing Wages and Benefits data, and one for mapping BLS data

  Bls1


cdlTools: Provides functions to download USDA National Agricultural Statistics Service (NASS) cropscape data for a specified state.

dataone: The dataone R package enables R scripts to search, download and upload science data and metadata from/to the DataONE Federation. The website describes DataOne as “a community driven project providing access to data across multiple member repositories, supporting enhanced search and discovery of Earth and environmental data”. The package comes with several vignettes including this overview.

dataRetrieval: Package to retrieve USGS and EPA hydrologic and water quality data, officially supported by USGS. The vignette gives several examples of downloading interesting data sets.

eechidna: Provides the data from the 2013 Australian Federal Election and tools to analyze it. There are several nicely done vignettes. The following plot which shows election results by polling place comes from the vignette on plotting polling stations.

Eechidna2

There are also vignettes on census and election data, shapefiles and mapping Australia's Electorates.

getHFdata: Provides functions to downloads and aggregate high frequency trading data for Brazilian instruments directly from the Bovespa ftp site. There is a vignette to get you started. The following plot showing unemployment data by state comes from the vignette on Census data.

googleAnalyticsR: Provides an interface to the Google Analytics Reporting API. There is a vignette.

googleway: Provides functions to retrieve data from 6 Google Maps APIs. The vignette shows how.

gutenberg: Search and download public domain works in the Project Gutenberg collection. The vignette shows you how to search and download public domain texts.

ie2miscdata: Contains a collection of USGS environmental and water resources data sets. There is a vignette showing how to create plots from the data. (See also: dataRetrieval.)

macleish: Provides functions to data from the Ada & Archibald MacLeish field station in Whately, MA. Thev ignette shows how to obtain weather data.

muckrock: Contains public domain information on requests made by muckrock through the US Freedom of Information Act.

nasadata: Provides an interface to NASA's Earth Imagery and Assets API and Earth Observatory and Natural Event Tracker.

oec: Provides an interface to the Observatory for Economic Complexity.

osi: Provides a connector to the Open Source Initiative API that provides machine –readable data about open source software licenses.

pewdata: Provides for reproducible, programmatic retrieval of survey data sets from the Pew Research Center. The vignette shows how to setup and use the package. Look here for an interesting poll about what Americans know about science.

TCGAretriever: Provides an interface to data sets from the The Cancer Genome Atlas (TCGA) via the Cancer Genomic Data Server web service

For more packages that provide APIs to data sets have a look at the CRAN Task View on Web Technologies and Services. For a list of interesting data sets out there in the wild see the MRAN Data Sources page.

[Update: added the dataRetrieval package, at the suggestion of Laura DeCicco.]

Editor's note: This is Joe's last post to Revolutions as a member of the Microsoft team: he is heading on for further adventures in the world of R. We want to thank Joe for his many contributions to the blog over the past 6 years, and please join us in wishing him well!

To leave a comment for the author, please follow the link and comment on their blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)