[This article was first published on GivenTheData
, and kindly contributed to R-bloggers
]. (You can report issue about the content on this page here
Want to share your content on R-bloggers? click here
if you have a blog, or here
if you don't.
I am happy to announce the first release of the R-package RWebData on Bitbucket
. The main aim of the package is to provide high-level functions that facilitate the access and systematic collection of data from REST APIs for the purpose of statistical analysis. RWebData is thus made for users that predominantly use R as a statistical software but do not have experience with web APIs and/or web data formats. In a broader sense (and in the long run) the package should serve as a high level interface to the programmable web for research in the social sciences (i.e., accessing the programmable web as a data source). The package thus takes up some of the broader ideas discussed in our paper
on the pvsR-package
. A short paper with a broader motivation for the package, some discussion of the package’s architecture, as well as a practical introduction with several examples can be found here
RWebData builds on many important packages that facilitate client-server interaction via R/HTTP as well as different parsers for web-data formats (including: RCurl, jsonlite, XML, XML2R, httr, mime, yaml, RJSONIO). At its core, the package provides a generic approach to map nested web data to a flat data representation in the form of one or several (non-nested) data-frames.
A simple example
This example is taken from the working paper on arXiv
. It illustrates the very basic usage of the package: Say you want to statistically analyze/visualize data provided from a web API, all you have is an URL pointing to the data of interest, you do not know/care what JSON, XML and the like are, you simply want the data in a format that is suitable for statistical analysis in R.
Here, we want to fetch data from the World Bank Indicators API which provides time series data on financial indicators of different countries (as XML in a compressed text file). In the example, we query data from that API in order to investigate how the United States’ public dept was affected by the financial crisis in 2008.
# install the package directly from bitbucket
# fetch the data and map it to a table-like representation (a data-frame)
u <- "http://api.worldbank.org/countries/USA/indicators/DP.DOD.DECN.CR.GG.CD?&date=2005Q1:2013Q4"
usdept <- getTabularData(u)
# analyze/visualize the data
ylab="U.S. public dept (in USD)")
More examples will follow...
Comments etc. very welcome
Please feel free to comment, make suggestions, and report issues (preferably via the issue-tracker in the Bitbucket-repository). As mentioned above, this is the first release. While I have already used the package to collect data for several of my own research projects, there are certainly still a lot of issues to be resolved...