**Wiekvoet**, and kindly contributed to R-bloggers)

In this post it is examined if it is possible to use Bayesian methods and specifically JAGS to analyze sensory profiling data. The aim is not to obtain different results, but rather to confirm that the results are fairly similar. The data used is the chocolate data from SensoMineR and the script is adapted from various online sources examined over a longer period.

## Building the model

JAGS, (but also WinBugs and OpenBugs) are programs which can be used to provide samples from posterior distributions. It is up to the user to provide data and model to JAGS and interpret the samples. Luckily, R provides infrastructure to help both in setting up models and data and in posterior analysis of the samples. Still, there are some steps to be done, before the analysis can be executed; Setting up data, defining model, initializing variables and deciding which parameters of the model are interesting. Only then JAGS can be called.

#### Setting up data

#### Model definition

The model can be written in ‘plain’ R and then given to JAGS. However, JAGS does not have vector operations, hence there are a lot of for loops which would be unacceptable for normal R usage. Besides the additive effects in the first part of the model, there are quite some extras. all the means in the model are coming out of hyperdistributions. In this case these are normal distributed. The precision (and hence variance) of these hyperdistributions are determined on basis of the data. The final part of the model translates the internal parameters into something which is sensible to interpret. The model also needs to be written into a file so JAGS can use it later on, these are the last two lines.

mymodel <- function() {

# core of the model

for (i in 1:N) {

fit[i] <- grandmean + mPanelist[Panelist[i]] + mProduct[Product[i]] + mPanelistProduct[Panelist[i],Product[i]]

y[i] ~ dnorm(fit[i],tau)

}

# grand mean and residual

tau ~ dgamma(0.001,0.001)

gsd <- sqrt(1/tau)

grandmean ~ dnorm(0,.001)

# variable Panelist distribution

mPanelist[1] <- 0

for (i in 2:nPanelist) {

mPanelist[i] ~ dnorm(offsetPanelist,tauPanelist)

}

offsetPanelist ~ dnorm(0,.001)

tauPanelist ~ dgamma(0.001,0.001)

sdPanelist <- sqrt(1/tauPanelist)

# Product distribution

mProduct[1] <- 0

for (i in 2:nProduct) {

mProduct[i] ~ dnorm(offsetProduct,tauProduct)

}

offsetProduct ~ dnorm(0,0.001)

tauProduct ~ dgamma(0.001,0.001)

sdProduct <- sqrt( 1/tauProduct)

# interaction distribution

for (i in 1:nPanelist) {

mPanelistProduct[i,1] <- 0

}

for (i in 2:nProduct) {

mPanelistProduct[1,i] <- 0

}

for (iPa in 2:nPanelist) {

for (iPr in 2:nProduct) {

mPanelistProduct[iPa,iPr] ~dnorm(offsetPP,tauPP)

}

}

offsetPP ~dnorm(0,0.001)

tauPP ~dgamma(0.001,0.001)

sdPP <- 1/sqrt(tauPP)

# getting the interesting data

# true means for Panelist

for (i in 1:nPanelist) {

meanPanelist[i] <- grandmean + mPanelist[i] + mean(mPanelistProduct[i,1:nProduct]) + mean(mProduct[1:nProduct])

}

# true means for Product

for (i in 1:nProduct) {

meanProduct[i] <- grandmean + mProduct[i] + mean(mPanelistProduct[1:nPanelist,i]) + mean(mPanelist[1:nPanelist])

}

for (i in 1:nPanelistcontr) {

Panelistdiff[i] <- meanPanelist[Panelistcontr[i,1]]-meanPanelist[Panelistcontr[i,2]]

}

for (i in 1:nProductcontr) {

Productdiff[i] <- meanProduct[Productcontr[i,1]]-meanProduct[Productcontr[i,2]]

}

}

model.file <- file.path(tempdir(),’mijnmodel.txt’)

#### Initializing variables

Anything values in the model which are not provided by the data, needs to be initialized. It is most convenient to setup a little model which can be used to get these values.

inits <- function() list(

grandmean = rnorm(1,3,1),

mPanelist = c(0,rnorm(data_list$nPanelist-1)) ,

mProduct = c(0,rnorm(data_list$nProduct-1)) ,

mPanelistProduct = rbind(rep(0,data_list$nProduct),cbind(rep(0,data_list$nPanelist-1),matrix(rnorm((data_list$nPanelist-1)*(data_list$nProduct-1)),nrow=data_list$nPanelist-1,ncol=data_list$nProduct-1)))

,

tau = runif(1,1,2),

tauPanelist = runif(1,1,3),

tauProduct = runif(1,1,3)

)

#### parameters of interest and calling JAGS

The parameters of interest is basically anything which we want know anything about. The JAGS call, is just listing all the parts provided before to JAGS. In this case, the model runs fairly quick, so I decided to have some extra iterations (n.iter) and an extra chain. For this moment, I decided not to calculate DIC.

parameters <- c(‘sdPanelist’,’sdProduct’,’gsd’,’meanPanelist’,

‘meanProduct’,’Productdiff’,’sdPP’)

jagsfit <- jags(data=data_list,inits=inits,model.file=model.file,

parameters.to.save=parameters,n.chains=4,DIC=FALSE,n.iter=10000)

## Result

Inference for Bugs model at "/tmp/Rtmp0VFW9g/mijnmodel.txt", fit using jags,

4 chains, each with 10000 iterations (first 5000 discarded), n.thin = 5

n.sims = 4000 iterations saved

mu.vect sd.vect 2.5% 25% 50% 75% 97.5% Rhat n.eff

Productdiff[1] 0.562 0.312 -0.037 0.356 0.558 0.773 1.186 1.001 4000

Productdiff[2] 2.303 0.317 1.697 2.087 2.299 2.512 2.932 1.001 4000

Productdiff[3] 1.741 0.314 1.118 1.530 1.748 1.948 2.346 1.001 3700

Productdiff[4] 0.836 0.307 0.224 0.630 0.838 1.044 1.424 1.002 2000

Productdiff[5] 0.274 0.303 -0.322 0.069 0.273 0.478 0.870 1.001 2600

Productdiff[6] -1.467 0.313 -2.081 -1.671 -1.465 -1.257 -0.865 1.001 2700

Productdiff[7] 0.339 0.308 -0.263 0.130 0.338 0.537 0.951 1.001 4000

Productdiff[8] -0.223 0.299 -0.807 -0.430 -0.218 -0.021 0.349 1.001 4000

Productdiff[9] -1.965 0.319 -2.587 -2.183 -1.966 -1.757 -1.322 1.001 4000

Productdiff[10] -0.497 0.294 -1.068 -0.699 -0.495 -0.300 0.082 1.002 1600

Productdiff[11] 0.740 0.317 0.130 0.525 0.733 0.952 1.374 1.001 4000

Productdiff[12] 0.177 0.300 -0.412 -0.019 0.178 0.382 0.753 1.001 4000

Productdiff[13] -1.564 0.322 -2.186 -1.776 -1.570 -1.349 -0.913 1.001 3300

Productdiff[14] -0.097 0.300 -0.686 -0.300 -0.090 0.102 0.505 1.002 2100

Productdiff[15] 0.401 0.300 -0.180 0.197 0.405 0.597 0.997 1.001 4000

gsd 1.689 0.067 1.563 1.644 1.687 1.734 1.821 1.002 1700

meanPanelist[1] 6.667 0.492 5.680 6.334 6.659 7.002 7.601 1.001 3100

meanPanelist[2] 5.513 0.436 4.660 5.219 5.512 5.808 6.369 1.001 4000

meanPanelist[3] 6.325 0.443 5.439 6.031 6.331 6.624 7.171 1.001 4000

meanPanelist[4] 7.394 0.443 6.542 7.094 7.400 7.703 8.262 1.002 1800

meanPanelist[5] 7.104 0.445 6.227 6.807 7.104 7.405 7.982 1.001 4000

meanPanelist[6] 6.201 0.436 5.328 5.912 6.208 6.497 7.034 1.001 4000

meanPanelist[7] 6.386 0.444 5.521 6.078 6.384 6.688 7.248 1.003 1100

meanPanelist[8] 6.451 0.443 5.562 6.160 6.452 6.760 7.301 1.001 4000

meanPanelist[9] 5.184 0.441 4.334 4.882 5.184 5.480 6.050 1.002 2300

meanPanelist[10] 8.110 0.460 7.212 7.815 8.107 8.411 9.020 1.001 3700

meanPanelist[11] 5.122 0.448 4.227 4.811 5.121 5.433 5.989 1.002 2100

meanPanelist[12] 7.191 0.441 6.327 6.891 7.181 7.492 8.061 1.001 2900

meanPanelist[13] 6.719 0.438 5.893 6.421 6.720 7.013 7.563 1.001 4000

meanPanelist[14] 6.266 0.440 5.396 5.978 6.266 6.553 7.138 1.002 1700

meanPanelist[15] 6.859 0.434 6.002 6.571 6.867 7.148 7.716 1.002 2200

meanPanelist[16] 6.079 0.434 5.238 5.783 6.082 6.371 6.930 1.002 2100

meanPanelist[17] 6.054 0.433 5.213 5.769 6.051 6.345 6.893 1.002 1800

meanPanelist[18] 6.122 0.420 5.281 5.844 6.129 6.398 6.939 1.002 2200

meanPanelist[19] 6.185 0.438 5.324 5.888 6.182 6.479 7.064 1.002 1500

meanPanelist[20] 4.988 0.456 4.088 4.680 4.997 5.292 5.873 1.001 4000

meanPanelist[21] 4.994 0.455 4.064 4.689 4.996 5.305 5.884 1.001 3400

meanPanelist[22] 5.062 0.451 4.177 4.763 5.065 5.370 5.944 1.001 2600

meanPanelist[23] 7.171 0.453 6.296 6.855 7.172 7.480 8.057 1.001 4000

meanPanelist[24] 5.663 0.442 4.799 5.368 5.671 5.959 6.532 1.002 2400

meanPanelist[25] 7.514 0.455 6.627 7.210 7.518 7.815 8.425 1.001 4000

meanPanelist[26] 6.320 0.439 5.432 6.023 6.325 6.624 7.156 1.001 2900

meanPanelist[27] 6.853 0.431 6.013 6.562 6.843 7.144 7.705 1.001 4000

meanPanelist[28] 7.045 0.440 6.197 6.741 7.043 7.340 7.926 1.001 4000

meanPanelist[29] 4.789 0.456 3.915 4.472 4.789 5.096 5.687 1.001 4000

meanProduct[1] 7.084 0.223 6.653 6.937 7.081 7.230 7.539 1.001 4000

meanProduct[2] 6.522 0.215 6.097 6.375 6.524 6.669 6.928 1.001 4000

meanProduct[3] 4.781 0.227 4.332 4.626 4.778 4.927 5.227 1.001 4000

meanProduct[4] 6.248 0.213 5.838 6.101 6.248 6.394 6.660 1.002 1500

meanProduct[5] 6.745 0.217 6.317 6.605 6.748 6.885 7.171 1.001 4000

meanProduct[6] 6.344 0.221 5.910 6.199 6.347 6.494 6.772 1.001 4000

sdPP 0.140 0.123 0.024 0.050 0.096 0.193 0.461 1.010 290

sdPanelist 0.996 0.178 0.700 0.869 0.978 1.106 1.393 1.001 4000

sdProduct 0.996 0.536 0.426 0.670 0.864 1.164 2.293 1.001 4000

For each parameter, n.eff is a crude measure of effective sample size,

and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

It is a big table, and it is needed to extract the required data from it. A good way is to plot the results. There are two plots to start, a quick summary and extensive plots. As the second plot command makes one figure per four variables, it is omitted. In the figure, it is observed that some of the product differences are different from 0, this means that it is believed these differences are present. From meanProducts it seems product 3 is quite lower than the other products. There is also quite some variation in meanPanelist. To be specific, panelist 10 scores high, while 9 and 11 score low.

Variables gsd and sdPanelist might be used to examine panel performance, but to examine this better, they should be compared with results from other descriptors.

plot(jagsfit)

#### Details about products

A main question if obviously, which products are different? For this we can extract some data from a summary

jagsfit.mc <- as.mcmc(jagsfit)

# plot(jagsfit.mc) # this plot give too many figures for the blog

fitsummary <- summary(jagsfit.mc)

# extract differences

Productdiff <- fitsummary$quantiles[ grep(‘Productdiff’,rownames(fitsummary$quantiles)),]

# extract differences different from 0

data_list$Productcontr[Productdiff[,1]>0 | Productdiff[,5]<0,]

# get the product means

ProductMean <- fitsummary$statistics[ grep(‘meanProduct’,rownames(fitsummary$quantiles)),]

rownames(ProductMean) <- levels(sensochoc$Product)

ProductMean

`> jagsfit.mc <- as.mcmc(jagsfit)`

> # plot(jagsfit.mc)

> fitsummary <- summary(jagsfit.mc)

> # extract differences

> Productdiff <- fitsummary$quantiles[ grep('Productdiff',rownames(fitsummary$quantiles)),]

> # extract differences different from 0

> data_list$Productcontr[Productdiff[,1]>0 | Productdiff[,5]<0,]

v1 v2

2 1 3

3 2 3

4 1 4

6 3 4

9 3 5

11 1 6

13 3 6

> # get the product means > ProductMean <- fitsummary$statistics[ grep('meanProduct',rownames(fitsummary$quantiles)),] > rownames(ProductMean) <- levels(sensochoc$Product) > ProductMean

Mean SD Naive SE Time-series SE

choc1 7.083966 0.2225106 0.003518201 0.003233152

choc2 6.521746 0.2150476 0.003400201 0.003988800

choc3 4.780643 0.2272578 0.003593261 0.003553964

choc4 6.247723 0.2128971 0.003366198 0.003382249

choc5 6.745146 0.2165525 0.003423996 0.003230432

choc6 6.344260 0.2206372 0.003488580 0.003662731

## Comparison with ANOVA

`> library(car)`

> aa <- aov(CocoaA ~ Product * Panelist,data=sensochoc)

> Anova(aa)

Anova Table (Type II tests)

Response: CocoaA

Sum Sq Df F value Pr(>F)

Product 207.54 5 12.6045 1.876e-10 ***

Panelist 390.43 28 4.2343 1.670e-09 ***

Product:Panelist 322.29 140 0.6991 0.9861

Residuals 573.00 174

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

> TukeyHSD(aa,'Product')

Tukey multiple comparisons of means

95% family-wise confidence level

Fit: aov(formula = CocoaA ~ Product * Panelist, data = sensochoc)

$Product

diff lwr upr p adj

choc2-choc1 -0.5344828 -1.5055716 0.4366061 0.6088175

choc3-choc1 -2.4137931 -3.3848819 -1.4427043 0.0000000

choc4-choc1 -0.8275862 -1.7986750 0.1435026 0.1433035

choc5-choc1 -0.2931034 -1.2641923 0.6779854 0.9531915

choc6-choc1 -0.7241379 -1.6952268 0.2469509 0.2674547

choc3-choc2 -1.8793103 -2.8503992 -0.9082215 0.0000014

choc4-choc2 -0.2931034 -1.2641923 0.6779854 0.9531915

choc5-choc2 0.2413793 -0.7297095 1.2124682 0.9797619

choc6-choc2 -0.1896552 -1.1607440 0.7814337 0.9932446

choc4-choc3 1.5862069 0.6151181 2.5572957 0.0000742

choc5-choc3 2.1206897 1.1496008 3.0917785 0.0000000

choc6-choc3 1.6896552 0.7185663 2.6607440 0.0000192

choc5-choc4 0.5344828 -0.4366061 1.5055716 0.6088175

choc6-choc4 0.1034483 -0.8676406 1.0745371 0.9996304

choc6-choc5 -0.4310345 -1.4021233 0.5400544 0.7960685

> model.tables(aa,type='mean',cterms='Product')

Tables of means

Grand mean

6.287356

Product

Product

choc1 choc2 choc3 choc4 choc5 choc6

7.086 6.552 4.672 6.259 6.793 6.362

## Conclusion

**leave a comment**for the author, please follow the link and comment on their blog:

**Wiekvoet**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...