R at Google

December 10, 2010
By

(This article was first published on Josh Paulson's Blog » R, and kindly contributed to R-bloggers)

Last night, Ni Wang and Max Lin from Google gave a talk to the New York R User Group discussing how R is used inside Google. About 150 R developers attended the meeting. Ni and Max said that R is used very widely at Google and is an integral part of the analytics work they do.

One interesting application is the Google Flu Trends project, which uses R to estimate current flu activity based on Google search results. Google Trends aggregates user search queries showing how often a particular word or phrase has been searched. Correlation tests are run on the search results to obtain a manageable data set of potentially relevant variables. Then using R, they massage the data and create models with optimized weights for each search term. From this, they are able to reasonably estimate current flu activity for different regions around the world.

When Google uses R in a production environment, they often work with very large data sets. For this, Google integrates R with several internal technologies including gfs, BigTable and ProtoBuf (using the RProtoBuf package). They said their internal system for analyzing large data sets worked in a manner very analogous to the R snow package.

Google also announced an R client for the Google Prediction API (a service which accesses Google’s machine learning algorithms to analyze historic data and predict future outcomes). The R client is available here: http://code.google.com/p/google-prediction-api-r-client/ 

Final note, Google has published an R Style Guide which may be of interest for those seeking a set of standards for R coding: http://google-styleguide.googlecode.com/svn/trunk/google-r-style.html


To leave a comment for the author, please follow the link and comment on his blog: Josh Paulson's Blog » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.