Analyse Quandl data with R – even from the cloud

March 10, 2013
By

(This article was first published on rapporter, and kindly contributed to R-bloggers)

I have read two thrilling news about the really promising time-series data provider called Quandl recently:
With the help of the Quandl R package* (development version is hosted on GitHub), it is really easy to fetch a variety of time-series directly from R - so no need even to deal with the standard file formats that the data provider currently offers (csv, XML, JSON) or to manually trigger the otherwise awesome API. The Quandl function can automatically "identify" (or to be more precise: parse from the provided metadata) the frequency of the time-series, and other valuable information can be also fetched with some further hacks. I will try to show a few in this post.

The plethora of available data at Quandl and the endless possibilities for statistical analysis provided by R made us work on a robust time-series reporting module, or so called template, that can be applied to hopefully any data sequence found on the site.

Our main intention was to also support supersets by default. This feature is a great way of combining separate time-series with a few clicks, now we try to provide a simple way to analyse those e.g. with computing the bivariate cross-correlation between those with different time-lags, and also to let users click on each variable for detailed univariate statistics with a calendar heatmap, seasonal decomposition or automatically identified best ARIMA models among others.

This may not seem sensational for the native R guys as the community has already developed awesome R packages for these tasks to be found on CRAN, GitHub, R-forge etc. But please bear in mind that we present a template here,  a module which is a compilation of these functions along with some dynamic annotations (also know as: literate programming) to be run against any time-series data - on your local computer or on the cloud. Long story short:

What we do in this template?

  • Downloading data from Quandle with given params [L20] and
  • drawing some commentary about the meta-data found in the JSON structure [L27].
  • As we are not using Quandl's R package to interact with their servers to be able to also use the provided meta-data, first we have to transform the data to a data.frame [L34] and also identify the potential number of variables to be analysed [at the end of L64] to choose from:
    • multivariate statistics:
      • overview of data as a line plot [L74-78],
      • cross-correlation for each pairs with additional line plot [L95-L110],
      • and a short text about the results [L112].
    • univariate statistics:
      • descriptive statistics of the data in a table [L122] and also in text [L129 and L136],
      • a histogram [L133] with base::hist (grid and all other style elements are automatically added with the pander package),
      • a line plot based on an automatically transformed ts object [L153-162] for which the frequency was identified by the original meta-data,
      • a calendar heatmap [L172-178] only for daily data,
      • autocorrelation [L199-L212],
      • seasonal decomposition only for non-annual data with enough cases [L225-L239],
      • a dummy linear model on year and optionally month, day of month and day of week [L259-L274]
      • with detailed global validation of assumptions based on gvlma [L275-L329]
      • also with check for linearity [L335] and residuals [L368],
      • computed predicted values based on the linear model [L384-L390],
      • and best fit ARIMA models for datasets with only few cases [L403].
  • with references.
Please see the source code of the template on GitHub for more details. Unfortunately we cannot let Rapporter users to fork this template, as we would rather not share our Quandl API key this time - but feel free to upload that file even with your unique Quandl API key to rapporter.net at Templates > New > Upload and start tweaking that at any time.

We would love to hear your feedback or about an updated version of the file!

Run locally

The template can be run inside of Rapporter for any user or in any local R session after loading our rapport R package. Just download the template and run:

library(rapport)
rapport('quandl.tpl')

Or apply the template to some custom data (Tammer's Oil, gold and stocks superset):

rapport('quandl.tpl', provider = 'USER_YY', dataset = 'ZH')

And even filter the results by date for only one variable of the above:

rapport('quandl.tpl', provider = 'USER_YY', dataset = 'ZH', from = '2012-01-01', to = '2012-06-31', variable = 'Oil')

And why not check the results on a HTML page instead of the R console?

rapport.html('quandl.tpl', provider = 'USER_YY', dataset = 'ZH', from = '2012-01-01', to = '2012-06-31', variable = 'Oil')

Run in the cloud

We had introduced Rapplications a few weeks ago, so that potentially all of our (your) templates can be run by anyone with a single Internet connection at our computational expense - even without registration and authentication.

We have also uploaded this template to rapporter.net and made a Rapplication for the template. Please find the following links that would bring up some real-time generated and/or partially cached  reports based on the above example with GET params:
You may also analyse any dataset available on Quandl, just pass the custom identifier with some optional arguments to our servers in the form of:

https://rapporter.net/api/rapplicate/?token=78d7432cba100b39818d0d2821c550e46a2745bf8b6dc6793f40c8c1f8e7439a&provider=USER_YY&dataset=ZH&variable=Oil&from=2012-01-01&to=2012-06-31&output_format=html&new_tab=true

With the following parameters:

  • token: the identifier of the Rapplication that stores the HTML/LaTeX/docx/odt stylesheet or reference document to apply to the report. Please use the above referenced token or create an own Rapplication.
  • provider: the Quandl internal Code (ID) for the data provider
  • dataset: the Quandl internal Code (ID) for the dataset
  • variable (optional): a name of the variable from the dataset to analyse with univariate methods
  • from and to (optional): filter by date in YYYY-MM-DD format
  • output_format (optional): the output format of the report from html, pdf, docx or odt. Defaults to html, so you might really ignore this.
  • new_tab (optional): set this to true not to force the HTML file to be downloaded
  • ignore_cache (optional): set this to true if you want to force to generate the report from scratch even we have it in the cache

Run from a widget

Of course we are aware of the fact that most R users would rather type in some commands in the R console instead of building a unique URL based on the above instructions, but we can definitely help you with that process as rapporter.net automatically generates a HTML form for each Rapplication even with some helper iframe code to let you easily integrate that in your home page or blog post:
And of course feel free to download the generated report as a pdf, docx or odt file for further editing (see the bottom of the left sidebar of the generated HTML page) and be sure to register for an account at rapporter.net to make and share similar statistical templates with friends and collaborators effortlessly.


* QuandlR would be a cool name IMHO

To leave a comment for the author, please follow the link and comment on his blog: rapporter.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.