An R interface to the Google Prediction API

December 10, 2010
By

(This article was first published on Revolutions, and kindly contributed to R-bloggers)

An the New York R User Group* last night, 100 R users heard Ni Wang and Max Lin talk explain how "R is one of the important tools used by analysts and engineers at Google for analyzing data". During the talk, Lin revealed that Google plans to make "R more integrated with internal machine learning algorithms and infrastructure", and one component of that plan was announced at the meeting: a new library for R to build and score models using the Google Prediction API.

The Google Prediction API is a black-box system for building predictive models. Given a set of training data (a set of continuous and/or categorical explanatory variables and a dependent variable), the Google algorithms automatically selects from several available machine learning techniques create a model from the training model. Then later, given a set of explanatory variables, you can predict the value of the dependent variable under this model.

Now with the googlepredictionapi R package (which you can download from Google Code), you can create such models based on data stored in a local CSV file or in the Google Storage system. The model is represented as an object in R, which you can then use to make predictions using the standard predict function, as illustrated in the following code:

## Make a training call to the Prediction API against data in the Google Storage.
## Replace MYBUCKET and MYDATA with your data.
my.model <- PredictionApiTrain(data="gs://MYBUCKET/MYDATA")
 
## Alternatively, make a training call against training data stored locally as a CSV file.
## Replace MYPATH and MYFILE with your data.
my.model <- PredictionApiTrain(data="MYPATH/MYFILE.csv")
 
## Read the summary of the trained model
summary(my.model)
 
## Make a prediction call for text data using the trained model
predict(my.model, "This is a new piece of text")
 
## Similarly, predict() works for numeric features
predict(my.model, c(6, 3, 5, 2))

You need to request access to the Google Prediction API to use this package (instructions how to request are here). Anyone tried this out yet? Given that all the standard statistical (as distinct from machine language) models are in R, this package would make it easy to compare the performance of the automated Prediction API with more traditional statistical techniques.

[*] The New York R User Group is proudly sponsored by Revolution Analytics.

New York R User Group: R at Google (via)

To leave a comment for the author, please follow the link and comment on their blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , , ,

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)