This guest post is by Tammer Kamel, Founder of Quandl
Finding and formatting numerical data for analysis in R or Excel or indeed any application is a pain that all real world data analysts know all too well. In aggregate I have probably spent weeks of my life trying to find data on the web. And several more weeks validating, formatting and cleaning the data. Analysis offers data scientists interesting, intellectually stimulating problems. But data acquisition, the necessary precursor, offers only tedium and pain. It's a time vampire.
The solution to this problem is conceptually obvious: one site with all the world’s data, nicely formatted and documented; an omni-platform. Platforms aspiring to this objective keep appearing and disappearing. They appear because they are great ideas. They disappear because they demand publishers upload and maintain data on an external site. Publishers don’t comply because they have enough work just maintaining the data in their own database, let alone someone else’s.
So, if the data won’t come to the platform the only alternative is the platform comes to the data. What does that mean? It means that to succeed in building a truly comprehensive data platform, you must ask nothing of data publishers. You have to create a solution that feeds off whatever the publisher is spitting out regardless of how absurdly the data might be published.
That’s what we're doing at Quandl. We've built a sort of "universal data parser" which has thus far parsed about 2.8 million datasets. We've asked nothing of any data publisher. As long as they spit out data somehow (excel, text file, blog post, xml, api, etc) the "Q-bot" can slurp it up.
The result is www.quandl.com as sort of "search engine" for numerical data. The idea with Quandl is that you can find data fast. And more importantly, once you find it, it is ready to use. This is because Quandl's bot returns data in a totally standard format. Which means we can then translate to any format a user wants.
Quandl is rich in financial, economic and sociological time series data. The data is easy to find. It is transparent to source. It can be easily merged with each other. It can be visualized and shared. It is all open. It is all free. There's much more about our vision on our about page.
In the near future we will be inviting (and indeed encouraging) Quandl users to "drive" the Quandl-bot themselves so that Quandl has the data they personally need. We're working towards building a sort of Wikipedia of numerical data. In the long term we hope to do to certain "closed data dinosaurs" what Jimmy Wales did to Britannica. In the short term, I would be very pleased if we could make Quandl a valuable resource for the R community.