Where the whisky flavor profile data came from

January 14, 2014
By

(This article was first published on Revolutions, and kindly contributed to R-bloggers)

Our crack-shot R trainer Luba Gloukhov generated a spirited (pun intended!) discussion from her post K-means Clustering 86 Single Malt Scotch Whiskies, with mentions of her analysis at FlowingData and Reddit amongst others. Other bloggers took a look at the data too, notably Christopher Ingraham who created this beautiful infographic of the flavour profiles of the 86 whiskies from the source data. (Be sure to click through to see the full chart.)

Whisky profiles detail

His analysis of the data led to the question: where did the source data come from in the first place? With some crowdsourced sleuthing, Christopher discovered the data comes from the first edition of the book Whisky Classified: Choosing Single Malts by Flavour by David Wishart. The story behind the data is quite interesting, and worth checking out if you're a whisky fan.

It turns out the data file Luba used comes from the first edition of the "Whisky Classified" book, and there were a few typos in the data to boot (for example, Bowmore had a Medicinal ranking of 1, but was actually a 2 in the book.) A commenter "Florin" at the Scotch and Ice Cream blog cleaned up the data and re-ran the analysis, and generated four slightly different clusters: peaty whiskies, ex-sherry whiskies, ex-bourbon / no peat whiskies, and whiskies with some ex-sherry blended in or with some peat. Extending the analysis to five clusters apparently succeeded in "separating the hard-core peated whiskies from the less-peated ones".  

Just goes to show: with just 86 rows of data here, you don't always need "Big Data" to generate interesting analysis!

To leave a comment for the author, please follow the link and comment on his blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.