Predicting apple liking from instrumental data

[This article was first published on Wiekvoet, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In this post I will examine if the individual models from previous posts together would make a good predictive model. Obviously, predictive capability is different from descriptive capability, predictions usually have a more simple model. As there was no test set the predictive capability will be analyzed using cross-validation. It should be noted, as one of the apples had no instrumental data, this sample cannot be used for the model.
It is found that the combination of models has a prediction error slightly worse compared to a simple PLS model using physical chemical data. This is not so bad. On the one hand, PLS is very much a predictive method, so, it should be performing better. On the other hand, PLS does not take curvature into account. This means that the PLS model has a handicap for some of the details. It would seem that these two balance out. Finally, the separate models can be interpreted better than a PLS model.

Model fit

The model uses all the separate parts which were defined previously in this blog. For ease of use this is all placed in a small function. A second function is used to generate predictions. The first figure shows the predictions using all data. In addition the ideal line (y=x) is added.
The normal predictions are not very impressive. Especially, the least liked products are not found least liked at all. Part of the problem may be in an intermediate step. CJuiciness, which is the juiciness observed by the consumers, is not well predicted. As it was earlier discovered that CJuiciness is the main explaining factor for  liking, this may explain part of the lack of prediction power. 

Model prediction capability using cross validation

Cross validation predictions are given in the second figure. The sub figures make very clear that the model does not do a good job of predicting new data. Again, Juiciness is a big part of the problem, now both the sensory observed juiciness and the consumer observed juiciness. 

Prediction error

For completeness, the prediction error is calculated. The Root Mean Squared Error of Prediction is 0.170. As a comparison, a PLS model of liking from physical chemical data has a slightly better prediction error. This makes the model defined at least a competitor to PLS. It also shows that predicting the liking poses a difficult problem.
Data: X dimension: 17 4 
Y dimension: 17 1
Fit method: kernelpls
Number of components considered: 4

VALIDATION: RMSEP
Cross-validated using 17 leave-one-out segments.
       (Intercept)  1 comps  2 comps  3 comps  4 comps
CV           0.174   0.1626   0.1617   0.2029   0.2007
adjCV        0.174   0.1615   0.1610   0.2005   0.1984

TRAINING: % variance explained
         1 comps  2 comps  3 comps  4 comps
X          34.74    80.63    87.17   100.00
CLiking    33.80    36.71    40.89    40.96

R code

library(xlsReadWrite)
library(gam)
library(plyr)
datain <- read.xls('condensed.xls')
datain <- datain[-grep('bag|net',datain$Products,ignore.case=TRUE),]
datain$week <-  sapply(strsplit(as.character(datain$Product),'_'),
    function(x) x[[2]])
dataval <- datain
vars <- names(dataval)[-1]
for (descriptor in vars) {
  dataval[,descriptor] <- as.numeric(gsub('[[:alpha:]]','',
          dataval[,descriptor]))
}

# all fits in a model
FitCombinedModel <- function(dataval) {
# link instrument to sensory
  MJuicsSEL3 <- gam(SJuiciness ~ AInstrumental.firmness + s(AJuice.release,3) + 
          s(ATitratable.acidity,3) ,data=dataval)
  MSweetSEL <- gam(SSweetness ~ AInstrumental.firmness + s(AJuice.release,3) + 
          ASoluble.solids + ATitratable.acidity  ,data=dataval)
  MCrispSEL <- gam(SCrispness ~ s(AInstrumental.firmness,3) + AJuice.release + 
          ASoluble.solids + ATitratable.acidity ,data=dataval)
  MMealiSEL <- gam(SMealiness ~ s(AInstrumental.firmness,3) + AJuice.release + 
          ASoluble.solids + ATitratable.acidity  ,data=dataval)
# link sensory to consumer experience
  MCJuic <- lm(CJuiciness ~ SJuiciness + SSweetness,data=dataval)
  MCSweet <- lm(CSweetness ~ SSweetness + SCrispness ,data=dataval)
# link consumer experience to liking
  Mliking <- lm(CLiking ~ CJuiciness+CSweetness + I(CJuiciness^2) ,data=dataval)
 return(list(MJuicsSEL3=MJuicsSEL3,MSweetSEL=MSweetSEL,MCrispSEL=MCrispSEL,
    MMealiSEL=MMealiSEL,MCJuic=MCJuic,MCSweet=MCSweet,Mliking=Mliking ))
}
PredictCombinedModel <- function(modellist,newdata) {
  preddata <- data.frame(SJuiciness = predict(modellist$MJuicsSEL3,newdata),
      SSweetness =  predict(modellist$MSweetSEL,newdata),
      SCrispness = predict(modellist$MCrispSEL,newdata),
      SMealiness = predict(modellist$MMealiSEL,newdata))
  preddata$CJuiciness <- predict(modellist$MCJuic,preddata)
  preddata$CSweetness <- predict(modellist$MCSweet,preddata)
  preddata$CLiking <- predict(modellist$Mliking,preddata)
  preddata
}

FCM <- FitCombinedModel(dataval) 
datain.nomiss <- dataval[-1,]

predictions <- PredictCombinedModel(FCM,datain.nomiss)
par(mfrow=c(2,3))
m_ply(c(‘SJuiciness’,’SSweetness’,’SCrispness’,’CJuiciness’,
        ‘CSweetness’,’CLiking’),function(x) {
   plot(y=predictions[,x],x=datain.nomiss[,x],ylab=paste(‘predicted’,x),
      xlab=paste(‘observed’,x))
   abline(a=0,b=1)
   })

#cross validation section
lopred <- mdply(1:nrow(datain.nomiss),function(x) {
      datain.nomisslo <- datain.nomiss[-x,]
      lomodel <- FitCombinedModel(datain.nomisslo)
      PredictCombinedModel(lomodel,datain.nomiss[x,])
    })
#lopred <- do.call(rbind,lopred)
par(mfrow=c(2,3))
m_ply(c(‘SJuiciness’,’SSweetness’,’SCrispness’,’CJuiciness’,
        ‘CSweetness’,’CLiking’),function(x) {
   plot(y=lopred[,x],x=datain.nomiss[,x],
     ylab=paste(‘cv predicted’,x),xlab=paste(‘observed’,x))
   abline(a=0,b=1)
})
# root mean squared error of prediction
sqrt(mean((lopred$CLiking-datain.nomiss$CLiking)^2))

#pls as benchmark
library(pls)
plsmodel <- plsr(CLiking ~ AInstrumental.firmness + AJuice.release + ASoluble.solids + ATitratable.acidity,data=datain.nomiss,
    scale=TRUE,validation=’LOO’ )
summary(plsmodel)

To leave a comment for the author, please follow the link and comment on their blog: Wiekvoet.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)