Predicting Pizza

March 26, 2010

(This article was first published on Revolutions, and kindly contributed to R-bloggers)

What’s the secret to the best pizza in New York? That’s what statistical consultant and R user Jared Lander sought to find out, by analyzing the rankings of NY pizza joints at, and building a regression model for ratings based on variables like localion, price, number of reviews, and pizza-oven type (gas, coal or wood)? Here’s a scatterplot matrix of the data set:

Jared published his conclusions in a paper (PDF), “New York Pizza: How to Find The Best”. He used a logit analysis in R to model the five-star rank from the various variables. His conclusions? First of all, there’s a big discrepancy between critics’ “Top 10” pizza rankings and those of the general public (at least as measured at, with only one of MenuPage’s Top 10 listed in the typical critic’s list. Secondly, while an Uptown location and a coal oven both popular draws (as measured by number of the reviews) none of the variables have a significant influence in rating:

Our findings were able to discern the factors that go into a pizzeria’s popularity but did not discover much differentiation in quality. Popularity and quality are not always equivalent. It is likely that we may have just proved the old adage about pizza: “Even when it’s bad, it’s still good.”

Slice: The ‘Moneyball’ of Pizza? Using Statistics to Find NYC’s Best Pies and Slices

To leave a comment for the author, please follow the link and comment on their blog: Revolutions. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: ,

Comments are closed.


Mango solutions

RStudio homepage

Zero Inflated Models and Generalized Linear Mixed Models with R

Quantide: statistical consulting and training




CRC R books series

Six Sigma Online Training

Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)