I have read two thrilling news about the really promising time-series data provider called Quandl recently:
With the help of the Quandl R package* (development version is hosted on GitHub), it is really easy to fetch a variety of time-series directly from R – so no need even to deal with the standard file formats that the data provider currently offers (csv, XML, JSON) or to manually trigger the otherwise awesome API. The
Quandl function can automatically “identify” (or to be more precise: parse from the provided metadata) the frequency of the time-series, and other valuable information can be also fetched with some further hacks. I will try to show a few in this post.
Our main intention was to also support supersets by default. This feature is a great way of combining separate time-series with a few clicks, now we try to provide a simple way to analyse those e.g. with computing the bivariate cross-correlation between those with different time-lags, and also to let users click on each variable for detailed univariate statistics with a calendar heatmap, seasonal decomposition or automatically identified best ARIMA models among others.
This may not seem sensational for the native R guys as the community has already developed awesome R packages for these tasks to be found on CRAN, GitHub, R-forge etc. But please bear in mind that we present a template here, a module which is a compilation of these functions along with some dynamic annotations (also know as: literate programming) to be run against any time-series data – on your local computer or on the cloud. Long story short:
What we do in this template?
- Downloading data from Quandle with given params [L20] and
- drawing some commentary about the meta-data found in the JSON structure [L27].
- As we are not using Quandl’s R package to interact with their servers to be able to also use the provided meta-data, first we have to transform the data to a
data.frame[L34] and also identify the potential number of variables to be analysed [at the end of L64] to choose from:
- multivariate statistics:
- overview of data as a line plot [L74-78],
- cross-correlation for each pairs with additional line plot [L95-L110],
- and a short text about the results [L112].
- univariate statistics:
- descriptive statistics of the data in a table [L122] and also in text [L129 and L136],
- a histogram [L133] with
base::hist(grid and all other style elements are automatically added with the
- a line plot based on an automatically transformed
tsobject [L153-162] for which the frequency was identified by the original meta-data,
- a calendar heatmap [L172-178] only for daily data,
- autocorrelation [L199-L212],
- seasonal decomposition only for non-annual data with enough cases [L225-L239],
- a dummy linear model on year and optionally month, day of month and day of week [L259-L274]
- with detailed global validation of assumptions based on
- also with check for linearity [L335] and residuals [L368],
- computed predicted values based on the linear model [L384-L390],
- and best fit ARIMA models for datasets with only few cases [L403].
- with references.
Please see the source code of the template on GitHub for more details. Unfortunately we cannot let Rapporter users to fork this template, as we would rather not share our Quandl API key this time – but feel free to upload that file even with your unique Quandl API key to rapporter.net at Templates > New > Upload and start tweaking that at any time.
We would love to hear your feedback or about an updated version of the file!
The template can be run inside of Rapporter for any user or in any local R session after loading our rapport R package. Just download the template and run:
Or apply the template to some custom data (Tammer’s Oil, gold and stocks superset):
rapport('quandl.tpl', provider = 'USER_YY', dataset = 'ZH')
And even filter the results by date for only one variable of the above:
rapport('quandl.tpl', provider = 'USER_YY', dataset = 'ZH', from = '2012-01-01', to = '2012-06-31', variable = 'Oil')
And why not check the results on a HTML page instead of the R console?
rapport.html('quandl.tpl', provider = 'USER_YY', dataset = 'ZH', from = '2012-01-01', to = '2012-06-31', variable = 'Oil')
Run in the cloud
We had introduced Rapplications a few weeks ago, so that potentially all of our (your) templates can be run by anyone with a single Internet connection at our computational expense – even without registration and authentication.
We have also uploaded this template to rapporter.net and made a Rapplication for the template. Please find the following links that would bring up some real-time generated and/or partially cached reports based on the above example with GET params:
- S&P 500 Stock Index price without any filters
- Tammer’s superset about oil, gold and S&P 500 Stock Index price (multivariate)
- Oil in Tammer’s superset between 2012-01-01 and 2012-06-31
You may also analyse any dataset available on Quandl, just pass the custom identifier with some optional arguments to our servers in the form of:
With the following parameters:
- token: the identifier of the Rapplication that stores the HTML/LaTeX/docx/odt stylesheet or reference document to apply to the report. Please use the above referenced
tokenor create an own Rapplication.
- provider: the Quandl internal Code (ID) for the data provider
- dataset: the Quandl internal Code (ID) for the dataset
- variable (optional): a name of the variable from the dataset to analyse with univariate methods
- from and to (optional): filter by date in
- output_format (optional): the output format of the report from
odt. Defaults to
html, so you might really ignore this.
- new_tab (optional): set this to
truenot to force the HTML file to be downloaded
- ignore_cache (optional): set this to
trueif you want to force to generate the report from scratch even we have it in the cache
Run from a widget
Of course we are aware of the fact that most R users would rather type in some commands in the R console instead of building a unique URL based on the above instructions, but we can definitely help you with that process as rapporter.net automatically generates a HTML form for each Rapplication even with some helper
iframe code to let you easily integrate that in your home page or blog post:
And of course feel free to download the generated report as a pdf, docx or odt file for further editing (see the bottom of the left sidebar of the generated HTML page) and be sure to register for an account at rapporter.net to make and share similar statistical templates with friends and collaborators effortlessly.
QuandlR would be a cool name IMHO