Where the whisky flavor profile data came from

January 14, 2014

(This article was first published on Revolutions, and kindly contributed to R-bloggers)

Our crack-shot R trainer Luba Gloukhov generated a spirited (pun intended!) discussion from her post K-means Clustering 86 Single Malt Scotch Whiskies, with mentions of her analysis at FlowingData and Reddit amongst others. Other bloggers took a look at the data too, notably Christopher Ingraham who created this beautiful infographic of the flavour profiles of the 86 whiskies from the source data. (Be sure to click through to see the full chart.)

Whisky profiles detail

His analysis of the data led to the question: where did the source data come from in the first place? With some crowdsourced sleuthing, Christopher discovered the data comes from the first edition of the book Whisky Classified: Choosing Single Malts by Flavour by David Wishart. The story behind the data is quite interesting, and worth checking out if you're a whisky fan.

It turns out the data file Luba used comes from the first edition of the "Whisky Classified" book, and there were a few typos in the data to boot (for example, Bowmore had a Medicinal ranking of 1, but was actually a 2 in the book.) A commenter "Florin" at the Scotch and Ice Cream blog cleaned up the data and re-ran the analysis, and generated four slightly different clusters: peaty whiskies, ex-sherry whiskies, ex-bourbon / no peat whiskies, and whiskies with some ex-sherry blended in or with some peat. Extending the analysis to five clusters apparently succeeded in "separating the hard-core peated whiskies from the less-peated ones".  

Just goes to show: with just 86 rows of data here, you don't always need "Big Data" to generate interesting analysis!

To leave a comment for the author, please follow the link and comment on their blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.


Mango solutions

RStudio homepage

Zero Inflated Models and Generalized Linear Mixed Models with R

Quantide: statistical consulting and training



CRC R books series

Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)