Predicting Pizza

March 26, 2010
By

(This article was first published on Revolutions, and kindly contributed to R-bloggers)

What's the secret to the best pizza in New York? That's what statistical consultant and R user Jared Lander sought to find out, by analyzing the rankings of NY pizza joints at MenuPages.com, and building a regression model for ratings based on variables like localion, price, number of reviews, and pizza-oven type (gas, coal or wood)? Here's a scatterplot matrix of the data set:

Pizza-scatter
Jared published his conclusions in a paper (PDF), "New York Pizza: How to Find The Best". He used a logit analysis in R to model the five-star rank from the various variables. His conclusions? First of all, there's a big discrepancy between critics' "Top 10" pizza rankings and those of the general public (at least as measured at MenuPages.com), with only one of MenuPage's Top 10 listed in the typical critic's list. Secondly, while an Uptown location and a coal oven both popular draws (as measured by number of the reviews) none of the variables have a significant influence in rating:

Our findings were able to discern the factors that go into a pizzeria’s popularity but did not discover much differentiation in quality. Popularity and quality are not always equivalent. It is likely that we may have just proved the old adage about pizza: “Even when it’s bad, it’s still good.”

Slice: The 'Moneyball' of Pizza? Using Statistics to Find NYC's Best Pies and Slices

To leave a comment for the author, please follow the link and comment on his blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: ,

Comments are closed.