R, Twitter and McDonald’s

[This article was first published on Revolutions, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Ed Chen is a data scientist at Twitter, so he's accustomed to working with big data and complex models. In an interview with MIT Technology Review, he describes his data science toolbox:

A common pattern for me is that I'll code a MapReduce job in Scala, do some simple command-line munging on the results, pass the data into Python or R for further analysis, pull from a database to grab some extra fields, and so on, often integrating what I find into some machine learning models in the end.

He put this toolbox to great use in a recent blog post, Infinite Mixture Models with Nonparametric Bayes and the Dirichlet Process. After using simulation in Ruby and Python to generate some test data, he used the R language to create a novel classification model which can group “similar” members from a data set, without needing to specify the number of groups in advance. He used this model to categorize the McDonald's member into a number of “food groups” containing products with similar nutritional content.

One cluster contained all the desserts: Baked Hot Apple Pie, Snack Size McFlurry, and others including the three below: 

Desserts
As you can see, the foods in the “dessert” group cluster together because of high trans fat content, low fiber, and other similar nutritional attributes. Other groups identified by the model include salads, burgers and other fried food, three categories of sauces, and (in a cluster all on its own) Fruit and Maple Oatmeal: the only high-fibre item on the menu.

For an astounding amount of detail about the analysis, visit Ed Chen's blog at the link below.

Edwin Chen's Blog: Infinite Mixture Models with Nonparametric Bayes and the Dirichlet Process

To leave a comment for the author, please follow the link and comment on their blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)