Liking of apples – some data to link

Posted on March 15, 2012 by Wingfeet in R bloggers | 0 Comments

[This article was first published on Wiekvoet, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I browsed through a paper by Peneau et al. (J. Sensory Studies, 2007) where they have nice data on apples; consumer evaluation, sensory evaluation and instrumental measurements. I think these are interesting data to examine if these variable blocks can be linked. This linking is a big thing in sensory science. In this post it is shown that consumers evaluation of juiciness is the main determining factor regarding liking (driver of liking).
Data
The data is given in three tables, giving averages over storage conditions for six cultivars for two storage times. Three cultivars were replicated. Since no data cultivar*storage condition is available, I will ignore the storage condition. Significant differences were indicated in the data tables. These I added when entering the data. The top left part of the data table:

library(xlsReadWrite)
datain <- read.xls('condensed.xls')
datain[1:5,1:5]

Products CLiking CFreshness CCrispness CJuiciness

1 Ariwa_W1 4.19ab 4.25a 4.39ab 4.14cd
2 Elstar_W1 4.25a 4.01ab 3.84d 4.32bcd
3 Jonagold_W1 4.31a 4.14a 4.35ab 4.56a
4 Gala_W1 4.19ab 4.08a 4.24bc 4.36abcd
5 Topaz_W1 4.35a 4.11a 4.59b 4.37abc

In this table the final part of the product name is the storage duration. The first character of the variables indicates the source. ‘C’ indicates this is consumer data. ‘S’ is used for sensory data and ‘A’ for analytical chemical data. To make the data ready storage condition (bag/net) and the significant differences are removed.

datain <- datain[-grep('bag|net',datain$Products,ignore.case=TRUE),]

#convert strings into numbers

vars <- names(datain)[-1]

for (descriptor in vars) {

datain[,descriptor] <- as.numeric(gsub('[[:alpha:]]','',datain[,descriptor]))

}

Main driver of liking

Random forests are my preferred way to get a quick view of the most important effects. They do not worry about more variables than objects and do not imply a linear relation.

#remove missing data and names

data2 <- datain[-1,-1]

rf1 <- randomForest(CLiking ~ .,data=data2,importance=TRUE)

varImpPlot(rf1)

The plot shows CJuiciness (consumer score for juiciness) is the main driver of liking. Indeed the effect is clear when plotting Cliking against CJuiciness.

plot(CLiking ~ CJuiciness,data=datain)

The plot gives rise to two questions;

Is the relation linear or slightly curved?
The variation in liking around CJuiciness is large. Are more explanatory variables needed
So, what drives CJuiciness?

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Liking of apples – some data to link

Related

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)