O’Reilly Media has been a big advocate of Open Data and believes that is where a lot of computing is going to be headed in the future. I think they are definitely on to something. Yet the future could be now. There is a lot of opportunities to find good data sources immediately. One of my favorite blogs, OReilly Radar, has an article by Edd Dumbill on Where To Find Data. There is plenty of good data available on the internet for download to explore and mine new information. These places not only offer great sources of data but many of them offer an API to allow quick and seamless access. Below is a link summary from the article.
An all-things graph database. The website focuses on trends of certain cultural and interest topics.
Amazon Public Data Sets
Amazon is probably considered the cloud computing mecca next to Google. Amazon Web Services offers a lot. One of which is storage of public data sets. They offer a huge variety of public data.
Windows Azure Data Marketplace
Surprisingly Microsoft has an open data protocol data source. This data market offers quite a few points of interest data sets.
Yahoo Query Language
YQL is an interesting API that is very similar to SQL. YQL is essentially a language that allows to grab data from cloud services. This could be very handy to grabbing data quickly and dynamically. YQL offers to connect to a lot of data sources as well.
Infochimps is a data marketplace warehouse. They offer to host, sell, and distribute data sets. Some of their data comes at a cost but a lot of their data is free as well. This is an interesting startup and will be very interesting to follow their growth. Also there is a new Infochimps R package that uses their API to gather data and process Infochimps data.
DBpedia is a wikipedia for data sets. In fact the data itself comes from Wikipedia.
Some other sources not from the article include the World Bank open data and the U.S. Census data.