Quandl: A Wikipedia for Time Series Data

February 20, 2013
By

(This article was first published on Revolutions, and kindly contributed to R-bloggers)

This guest post is by Tammer Kamel, Founder of Quandl

Finding and formatting numerical data for analysis in R or Excel or indeed any application is a pain that all real world data analysts know all too well.  In aggregate I have probably spent weeks of my life trying to find data on the web.  And several more weeks validating, formatting and cleaning the data.  Analysis offers data scientists interesting, intellectually stimulating problems.  But data acquisition, the necessary precursor, offers only tedium and pain.  It's a time vampire.

The solution to this problem is conceptually obvious:  one site with all the world’s data, nicely formatted and documented; an omni-platform.  Platforms aspiring to this objective keep appearing and disappearing.  They appear because they are great ideas.  They disappear because they demand publishers upload and maintain data on an external site.  Publishers don’t comply because they have enough work just maintaining the data in their own database, let alone someone else’s.

So, if the data won’t come to the platform the only alternative is the platform comes to the data.  What does that mean? It means that to succeed in building a truly comprehensive data platform, you must ask nothing of data publishers.  You have to create a solution that feeds off whatever the publisher is spitting out regardless of how absurdly the data might be published.

That’s what we're doing at Quandl.  We've built a sort of "universal data parser" which has thus far parsed about 2.8 million datasets.  We've asked nothing of any data publisher.  As long as they spit out data somehow (excel, text file, blog post, xml, api, etc) the "Q-bot" can slurp it up.

The result is www.quandl.com as sort of "search engine" for numerical data.  The idea with Quandl is that you can find data fast.  And more importantly, once you find it, it is ready to use.  This is because Quandl's bot returns data in a totally standard format.  Which means we can then translate to any format a user wants.

Quandl is rich in financial, economic and sociological time series data.  The data is easy to find.  It is transparent to source.  It can be easily merged with each other.  It can be visualized and shared.  It is all open.  It is all free.  There's much more about our vision on our about page.

From the start, Quandl delivered data in all the standard formats (Excel, csv, xml, json).  We're now moving on to deliver data to applications in the exact format those apps demand their data.  We're starting with R.  We've done something simple to start.  The next step for us is to complete an R package to be made available on CRAN.

In the near future we will be inviting (and indeed encouraging) Quandl users to "drive" the Quandl-bot themselves so that Quandl has the data they personally need.  We're working towards building a sort of Wikipedia of numerical data.  In the long term we hope to do to certain "closed data dinosaurs" what Jimmy Wales did to Britannica.  In the short term, I would be very pleased if we could make Quandl a valuable resource for the R community.

To leave a comment for the author, please follow the link and comment on his blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.