An R interface to the Google Prediction API

Posted on December 10, 2010 by David Smith in R bloggers, Uncategorized | 0 Comments

[This article was first published on Revolutions, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

An the New York R User Group* last night, 100 R users heard Ni Wang and Max Lin talk explain how “R is one of the important tools used by analysts and engineers at Google for analyzing data”. During the talk, Lin revealed that Google plans to make “R more integrated with internal machine learning algorithms and infrastructure”, and one component of that plan was announced at the meeting: a new library for R to build and score models using the Google Prediction API.

The Google Prediction API is a black-box system for building predictive models. Given a set of training data (a set of continuous and/or categorical explanatory variables and a dependent variable), the Google algorithms automatically selects from several available machine learning techniques create a model from the training model. Then later, given a set of explanatory variables, you can predict the value of the dependent variable under this model.

Now with the googlepredictionapi R package (which you can download from Google Code), you can create such models based on data stored in a local CSV file or in the Google Storage system. The model is represented as an object in R, which you can then use to make predictions using the standard predict function, as illustrated in the following code:

## Make a training call to the Prediction API against data in the Google Storage.
## Replace MYBUCKET and MYDATA with your data.
my.model <- PredictionApiTrain(data="gs://MYBUCKET/MYDATA")
 
## Alternatively, make a training call against training data stored locally as a CSV file.
## Replace MYPATH and MYFILE with your data.
my.model <- PredictionApiTrain(data="MYPATH/MYFILE.csv")
 
## Read the summary of the trained model
summary(my.model)
 
## Make a prediction call for text data using the trained model
predict(my.model, "This is a new piece of text")
 
## Similarly, predict() works for numeric features
predict(my.model, c(6, 3, 5, 2))

You need to request access to the Google Prediction API to use this package (instructions how to request are here). Anyone tried this out yet? Given that all the standard statistical (as distinct from machine language) models are in R, this package would make it easy to compare the performance of the automated Prediction API with more traditional statistical techniques.

[*] The New York R User Group is proudly sponsored by Revolution Analytics.

New York R User Group: R at Google (via)

To leave a comment for the author, please follow the link and comment on their blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

An R interface to the Google Prediction API

Related

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)