While Microsoft rolled out its Technical Computing Initiative — promising new tools for distributed parallel computing on large data sets in the cloud — with much fanfare earlier this week, Google made a rather more understated response. In a post to the developer-focused Google Code Blog, they quietly announced two new, but potentially disruptive, products. Google BigQuery promises super-fast SQL-like queries on massive data sets (provided the data has been uploaded to a Google host). And the Google Prediction API apparently offers a machine learning black-box: pump some (again, Google-hosted) data in, and "advanced machine learning algorithms" will then give a service that predicts outcomes from new data. Presumably, it does something like using a subset of the data to train a number of machine learning algorithms (or an ensemble of them) and then chooses the best one for prediction. A little like the (now-discontinued) machine-learning service based on R, but obviously at Google scale. The Google products, for now, are only available to invitees as part of Google Labs, but it’s certainly an interesting development: essentially making machine learning a commodity. If it works (and if the intended business users don’t mind having their private data hosted by Google), it could certainly add some turbulence to the nascent market of machine learning in the cloud. I’ll be following tomorrow’s keynote on the topic (via Google Wave) with much interest.
Google Code blog: BigQuery and Prediction API: Get more from your data with Google