# Bayesian ANOVA for sensory panel profiling data

April 30, 2012
By

(This article was first published on Wiekvoet, and kindly contributed to R-bloggers)

In this post it is examined if it is possible to use Bayesian methods and specifically JAGS to analyze sensory profiling data. The aim is not to obtain different results, but rather to confirm that the results are fairly similar. The data used is the chocolate data from SensoMineR and the script is adapted from various online sources examined over a longer period.

## Building the model

JAGS, (but also WinBugs and OpenBugs) are programs which can be used to provide samples from posterior distributions. It is up to the user to provide data and model to JAGS and interpret the samples. Luckily, R provides infrastructure to help both in setting up models and data and in posterior analysis of the samples. Still, there are some steps to be done, before the analysis can be executed; Setting up data, defining model, initializing variables and deciding which parameters of the model are interesting. Only then JAGS can be called.

#### Setting up data

Data is any data which goes into JAGS. This includes experimental design, measurements, but also number of rows in the data. For this post I have added some extra data, since I want to compare differences between product means. As I want to compare those, I need to have samples from these specific distributions. All the data needs to go into one big list, which will be given to JAGS later on.
# get libraries
library(SensoMineR)
library(coda)
library(R2jags)

# infrastructure for contrasts
FullContrast <- function(n) {
UseMethod('FullContrast',n)
}
FullContrast.default <- function(n) stop('FullContrast only takes integer and factors')
FullContrast.integer <- function(n) {
mygrid <- expand.grid(v1=1:n,v2=1:n)
mygrid <- mygrid[mygrid$v1<mygrid$v2,]
rownames(mygrid) <- 1:nrow(mygrid)
as.matrix(mygrid)
}
FullContrast.factor <- function(n) {
FullContrast(nlevels(n))
}

data(chocolates)

# setting up experimental data
data_list <- with(sensochoc,list(Panelist = as.numeric(Panelist),
nPanelist = nlevels(Panelist),
Product = as.numeric(Product),
nProduct = nlevels(Product),
y=CocoaA,
N=nrow(sensochoc)))
# contrasts
data_list$Panelistcontr <- FullContrast(sensochoc$Panelist)
data_list$nPanelistcontr <- nrow(data_list$Panelistcontr)
data_list$Productcontr <- FullContrast(sensochoc$Product)
data_list$nProductcontr <- nrow(data_list$Productcontr)

#### Model definition

The model can be written in 'plain' R and then given to JAGS. However, JAGS does not have vector operations, hence there are a lot of for loops which would be unacceptable for normal R usage. Besides the additive effects in the first part of the model, there are quite some extras. all the means in the model are coming out of hyperdistributions. In this case these are normal distributed. The precision (and hence variance) of these hyperdistributions are determined on basis of the data. The final part of the model translates the internal parameters into something which is sensible to interpret. The model also needs to be written into a file so JAGS can use it later on, these are the last two lines.

mymodel <- function() {
# core of the model
for (i in 1:N) {
fit[i] <- grandmean +  mPanelist[Panelist[i]] + mProduct[Product[i]] + mPanelistProduct[Panelist[i],Product[i]]
y[i] ~ dnorm(fit[i],tau)
}
# grand mean and residual
tau ~ dgamma(0.001,0.001)
gsd <-  sqrt(1/tau)
grandmean ~ dnorm(0,.001)
# variable Panelist distribution
mPanelist[1] <- 0
for (i in 2:nPanelist) {
mPanelist[i] ~ dnorm(offsetPanelist,tauPanelist)
}
offsetPanelist ~ dnorm(0,.001)
tauPanelist ~ dgamma(0.001,0.001)
sdPanelist <- sqrt(1/tauPanelist)
# Product distribution
mProduct[1] <- 0
for (i in 2:nProduct) {
mProduct[i] ~ dnorm(offsetProduct,tauProduct)
}
offsetProduct ~ dnorm(0,0.001)
tauProduct ~ dgamma(0.001,0.001)
sdProduct <- sqrt( 1/tauProduct)
# interaction distribution
for (i in 1:nPanelist) {
mPanelistProduct[i,1] <- 0
}
for (i in 2:nProduct) {
mPanelistProduct[1,i] <- 0
}
for (iPa in 2:nPanelist) {
for (iPr in 2:nProduct) {
mPanelistProduct[iPa,iPr] ~dnorm(offsetPP,tauPP)
}
}
offsetPP ~dnorm(0,0.001)
tauPP ~dgamma(0.001,0.001)
sdPP <- 1/sqrt(tauPP)
# getting the interesting data
# true means for Panelist
for (i in 1:nPanelist) {
meanPanelist[i] <- grandmean + mPanelist[i] + mean(mPanelistProduct[i,1:nProduct]) + mean(mProduct[1:nProduct])
}
# true means for Product
for (i in 1:nProduct) {
meanProduct[i] <- grandmean + mProduct[i] + mean(mPanelistProduct[1:nPanelist,i]) + mean(mPanelist[1:nPanelist])
}
for (i in 1:nPanelistcontr) {
Panelistdiff[i] <- meanPanelist[Panelistcontr[i,1]]-meanPanelist[Panelistcontr[i,2]]
}
for (i in 1:nProductcontr) {
Productdiff[i] <- meanProduct[Productcontr[i,1]]-meanProduct[Productcontr[i,2]]
}
}

model.file <- file.path(tempdir(),'mijnmodel.txt')
write.model(mymodel,con=model.file)

#### Initializing variables

Anything values in the model which are not provided by the data, needs to be initialized. It is most convenient to setup a little model which can be used to get these values.

inits <- function() list(
grandmean = rnorm(1,3,1),
mPanelist = c(0,rnorm(data_list$nPanelist-1)) , mProduct = c(0,rnorm(data_list$nProduct-1)) ,
mPanelistProduct = rbind(rep(0,data_list$nProduct),cbind(rep(0,data_list$nPanelist-1),matrix(rnorm((data_list$nPanelist-1)*(data_list$nProduct-1)),nrow=data_list$nPanelist-1,ncol=data_list$nProduct-1)))
,
tau = runif(1,1,2),
tauPanelist = runif(1,1,3),
tauProduct = runif(1,1,3)
)

#### parameters of interest and calling JAGS

The parameters of interest is basically anything which we want know anything about. The JAGS call, is just listing all the parts provided before to JAGS. In this case, the model runs fairly quick, so I decided to have some extra iterations (n.iter) and an extra chain. For this moment, I decided not to calculate DIC.
parameters <- c('sdPanelist','sdProduct','gsd','meanPanelist',
'meanProduct','Productdiff','sdPP')
jagsfit <- jags(data=data_list,inits=inits,model.file=model.file,
parameters.to.save=parameters,n.chains=4,DIC=FALSE,n.iter=10000)

## Result

The first part of the result can be obtained via a simple print of jagsfit
jagsfit

Inference for Bugs model at "/tmp/Rtmp0VFW9g/mijnmodel.txt", fit using jags, 4 chains, each with 10000 iterations (first 5000 discarded), n.thin = 5 n.sims = 4000 iterations saved                 mu.vect sd.vect   2.5%    25%    50%    75%  97.5%  Rhat n.effProductdiff[1]     0.562   0.312 -0.037  0.356  0.558  0.773  1.186 1.001  4000Productdiff[2]     2.303   0.317  1.697  2.087  2.299  2.512  2.932 1.001  4000Productdiff[3]     1.741   0.314  1.118  1.530  1.748  1.948  2.346 1.001  3700Productdiff[4]     0.836   0.307  0.224  0.630  0.838  1.044  1.424 1.002  2000Productdiff[5]     0.274   0.303 -0.322  0.069  0.273  0.478  0.870 1.001  2600Productdiff[6]    -1.467   0.313 -2.081 -1.671 -1.465 -1.257 -0.865 1.001  2700Productdiff[7]     0.339   0.308 -0.263  0.130  0.338  0.537  0.951 1.001  4000Productdiff[8]    -0.223   0.299 -0.807 -0.430 -0.218 -0.021  0.349 1.001  4000Productdiff[9]    -1.965   0.319 -2.587 -2.183 -1.966 -1.757 -1.322 1.001  4000Productdiff[10]   -0.497   0.294 -1.068 -0.699 -0.495 -0.300  0.082 1.002  1600Productdiff[11]    0.740   0.317  0.130  0.525  0.733  0.952  1.374 1.001  4000Productdiff[12]    0.177   0.300 -0.412 -0.019  0.178  0.382  0.753 1.001  4000Productdiff[13]   -1.564   0.322 -2.186 -1.776 -1.570 -1.349 -0.913 1.001  3300Productdiff[14]   -0.097   0.300 -0.686 -0.300 -0.090  0.102  0.505 1.002  2100Productdiff[15]    0.401   0.300 -0.180  0.197  0.405  0.597  0.997 1.001  4000gsd                1.689   0.067  1.563  1.644  1.687  1.734  1.821 1.002  1700meanPanelist[1]    6.667   0.492  5.680  6.334  6.659  7.002  7.601 1.001  3100meanPanelist[2]    5.513   0.436  4.660  5.219  5.512  5.808  6.369 1.001  4000meanPanelist[3]    6.325   0.443  5.439  6.031  6.331  6.624  7.171 1.001  4000meanPanelist[4]    7.394   0.443  6.542  7.094  7.400  7.703  8.262 1.002  1800meanPanelist[5]    7.104   0.445  6.227  6.807  7.104  7.405  7.982 1.001  4000meanPanelist[6]    6.201   0.436  5.328  5.912  6.208  6.497  7.034 1.001  4000meanPanelist[7]    6.386   0.444  5.521  6.078  6.384  6.688  7.248 1.003  1100meanPanelist[8]    6.451   0.443  5.562  6.160  6.452  6.760  7.301 1.001  4000meanPanelist[9]    5.184   0.441  4.334  4.882  5.184  5.480  6.050 1.002  2300meanPanelist[10]   8.110   0.460  7.212  7.815  8.107  8.411  9.020 1.001  3700meanPanelist[11]   5.122   0.448  4.227  4.811  5.121  5.433  5.989 1.002  2100meanPanelist[12]   7.191   0.441  6.327  6.891  7.181  7.492  8.061 1.001  2900meanPanelist[13]   6.719   0.438  5.893  6.421  6.720  7.013  7.563 1.001  4000meanPanelist[14]   6.266   0.440  5.396  5.978  6.266  6.553  7.138 1.002  1700meanPanelist[15]   6.859   0.434  6.002  6.571  6.867  7.148  7.716 1.002  2200meanPanelist[16]   6.079   0.434  5.238  5.783  6.082  6.371  6.930 1.002  2100meanPanelist[17]   6.054   0.433  5.213  5.769  6.051  6.345  6.893 1.002  1800meanPanelist[18]   6.122   0.420  5.281  5.844  6.129  6.398  6.939 1.002  2200meanPanelist[19]   6.185   0.438  5.324  5.888  6.182  6.479  7.064 1.002  1500meanPanelist[20]   4.988   0.456  4.088  4.680  4.997  5.292  5.873 1.001  4000meanPanelist[21]   4.994   0.455  4.064  4.689  4.996  5.305  5.884 1.001  3400meanPanelist[22]   5.062   0.451  4.177  4.763  5.065  5.370  5.944 1.001  2600meanPanelist[23]   7.171   0.453  6.296  6.855  7.172  7.480  8.057 1.001  4000meanPanelist[24]   5.663   0.442  4.799  5.368  5.671  5.959  6.532 1.002  2400meanPanelist[25]   7.514   0.455  6.627  7.210  7.518  7.815  8.425 1.001  4000meanPanelist[26]   6.320   0.439  5.432  6.023  6.325  6.624  7.156 1.001  2900meanPanelist[27]   6.853   0.431  6.013  6.562  6.843  7.144  7.705 1.001  4000meanPanelist[28]   7.045   0.440  6.197  6.741  7.043  7.340  7.926 1.001  4000meanPanelist[29]   4.789   0.456  3.915  4.472  4.789  5.096  5.687 1.001  4000meanProduct[1]     7.084   0.223  6.653  6.937  7.081  7.230  7.539 1.001  4000meanProduct[2]     6.522   0.215  6.097  6.375  6.524  6.669  6.928 1.001  4000meanProduct[3]     4.781   0.227  4.332  4.626  4.778  4.927  5.227 1.001  4000meanProduct[4]     6.248   0.213  5.838  6.101  6.248  6.394  6.660 1.002  1500meanProduct[5]     6.745   0.217  6.317  6.605  6.748  6.885  7.171 1.001  4000meanProduct[6]     6.344   0.221  5.910  6.199  6.347  6.494  6.772 1.001  4000sdPP               0.140   0.123  0.024  0.050  0.096  0.193  0.461 1.010   290sdPanelist         0.996   0.178  0.700  0.869  0.978  1.106  1.393 1.001  4000sdProduct          0.996   0.536  0.426  0.670  0.864  1.164  2.293 1.001  4000For each parameter, n.eff is a crude measure of effective sample size,and Rhat is the potential scale reduction factor (at convergence, Rhat=1).
It is a big table, and it is needed to extract the required data from it. A good way is to plot the results. There are two plots to start, a quick summary and extensive plots. As the second plot command makes one figure per four variables, it is omitted. In the figure, it is observed that some of the product differences are different from 0, this means that it is believed these differences are present. From meanProducts it seems product 3 is quite lower than the other products. There is also quite some variation in meanPanelist. To be specific, panelist 10 scores high, while 9 and 11 score low.
Variables gsd and sdPanelist might be used to examine panel performance, but to examine this better, they should be compared with results from other descriptors.
plot(jagsfit)

A main question if obviously, which products are different? For this we can extract some data from a summary
jagsfit.mc <- as.mcmc(jagsfit)
# plot(jagsfit.mc) # this plot give too many figures for the blog
fitsummary <- summary(jagsfit.mc)
# extract differences
Productdiff <- fitsummary$quantiles[ grep('Productdiff',rownames(fitsummary$quantiles)),]
# extract differences different from 0
data_list$Productcontr[Productdiff[,1]>0 | Productdiff[,5]<0,] # get the product means ProductMean <- fitsummary$statistics[ grep('meanProduct',rownames(fitsummary$quantiles)),] rownames(ProductMean) <- levels(sensochoc$Product)
ProductMean
The result shows us a table of product pairs which are different; most of these are related to product 3, but also product 1 is different from 4 and 6.

> jagsfit.mc <- as.mcmc(jagsfit)> # plot(jagsfit.mc)> fitsummary <- summary(jagsfit.mc)> # extract differences> Productdiff <- fitsummary$quantiles[ grep('Productdiff',rownames(fitsummary$quantiles)),]> # extract differences different from 0> data_list$Productcontr[Productdiff[,1]>0 | Productdiff[,5]<0,] v1 v22 1 33 2 34 1 46 3 49 3 511 1 613 3 6> # get the product means > ProductMean <- fitsummary$statistics[ grep('meanProduct',rownames(fitsummary$quantiles)),] > rownames(ProductMean) <- levels(sensochoc$Product) > ProductMean           Mean        SD    Naive SE Time-series SEchoc1 7.083966 0.2225106 0.003518201    0.003233152choc2 6.521746 0.2150476 0.003400201    0.003988800choc3 4.780643 0.2272578 0.003593261    0.003553964choc4 6.247723 0.2128971 0.003366198    0.003382249choc5 6.745146 0.2165525 0.003423996    0.003230432choc6 6.344260 0.2206372 0.003488580    0.003662731

## Comparison with ANOVA

A few lines in R will give the standard analysis. The product means are very close. This ANOVA shows only differences involving product 3. This is probably due to usage of TukeyHSD, which can be a bit conservative in the ANOVA while the comparison in the Bayesian model is unprotected.

> library(car)> aa <- aov(CocoaA ~ Product * Panelist,data=sensochoc)> Anova(aa)Anova Table (Type II tests)Response: CocoaA                 Sum Sq  Df F value    Pr(>F)    Product          207.54   5 12.6045 1.876e-10 ***Panelist         390.43  28  4.2343 1.670e-09 ***Product:Panelist 322.29 140  0.6991    0.9861    Residuals        573.00 174                      ---Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 > TukeyHSD(aa,'Product')  Tukey multiple comparisons of means    95% family-wise confidence levelFit: aov(formula = CocoaA ~ Product * Panelist, data = sensochoc)\$Product                  diff        lwr        upr     p adjchoc2-choc1 -0.5344828 -1.5055716  0.4366061 0.6088175choc3-choc1 -2.4137931 -3.3848819 -1.4427043 0.0000000choc4-choc1 -0.8275862 -1.7986750  0.1435026 0.1433035choc5-choc1 -0.2931034 -1.2641923  0.6779854 0.9531915choc6-choc1 -0.7241379 -1.6952268  0.2469509 0.2674547choc3-choc2 -1.8793103 -2.8503992 -0.9082215 0.0000014choc4-choc2 -0.2931034 -1.2641923  0.6779854 0.9531915choc5-choc2  0.2413793 -0.7297095  1.2124682 0.9797619choc6-choc2 -0.1896552 -1.1607440  0.7814337 0.9932446choc4-choc3  1.5862069  0.6151181  2.5572957 0.0000742choc5-choc3  2.1206897  1.1496008  3.0917785 0.0000000choc6-choc3  1.6896552  0.7185663  2.6607440 0.0000192choc5-choc4  0.5344828 -0.4366061  1.5055716 0.6088175choc6-choc4  0.1034483 -0.8676406  1.0745371 0.9996304choc6-choc5 -0.4310345 -1.4021233  0.5400544 0.7960685> model.tables(aa,type='mean',cterms='Product')Tables of meansGrand mean         6.287356  Product Productchoc1 choc2 choc3 choc4 choc5 choc6 7.086 6.552 4.672 6.259 6.793 6.362

## Conclusion

JAGS can be used to analyzed sensory profiling data. However, if a simple model such as two way ANOVA is used, it does not seem to be worth the trouble.