Clustering the world’s diets

March 10, 2010

(This article was first published on Revolutions, and kindly contributed to R-bloggers)

Cluster Analysis is a useful technique for classifying the members of a group (people, events, measurements, etc) into "similar" groups. How "similar" is defined depends on the application, but generally involves looking at a number of attributes of the group. For example, we could cluster people by looking at their skin color, hair type, facial features, perhaps even genetic markers and find that we end up with clusters that are somehow associated with ethnicity. 

Here’s a fascinating application of cluster analysis: given data on what the citizens of each country eat (on aggregate), can we cluster the countries of the world into groups with similar diets? That’s what Diego Valle did, using the pam (partitioning around medioids) function in R. He presents the six clusters he identifies as a color-coded world map (click to enlarge):

Clustering the worlds diet

Australia gets grouped with North America and much of Europe and Russia as countries whose citizens enjoy a high-calorie diet with all kinds of foods (except not many beans). Countries in yellow have a cereal-rich diet. The diets of the south-east Asian cluster are heavy on fish and rice, but not dairy foods. See Diego’s blog for the description of the other clusters, and the R code which created the analysis. The code reads the data directly from a Google Spreadsheet in the cloud, so you can easily run it yourself. It also produces an interesting chart comparing the American diet to that of the rest of the world.

Diego Valle’s Food & Fishing Blog: Cluster Analysis of What The World Eats

To leave a comment for the author, please follow the link and comment on their blog: Revolutions. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , ,

Comments are closed.


Mango solutions

RStudio homepage

Zero Inflated Models and Generalized Linear Mixed Models with R

Quantide: statistical consulting and training




CRC R books series

Six Sigma Online Training

Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)