# Ordinal data, models with observers

April 21, 2013
By

(This article was first published on Wiekvoet, and kindly contributed to R-bloggers)

I recently made three posts regarding analysis of ordinal data. A post looking at all methods I could find in R, a post with an additional method and a post using JAGS. Common in all three was using the cheese data, a data set where only product data was available, observers were not there. The real world is different, there are observers and they are quite different. So, in this post I have added observers and look at two models again; Simple ANOVA and, complex, an ordinal model with random observers.

### Data

To spare myself looking for data, and keeping in mind I may want to run some simulations, I have created a data generator. In this generator I entered some of the things I could imagine regarding the observers. First of all they all have different sensitivities. In other words, what is very salt for some, is not at all salt for others. This is expressed in an additive and a multiplicative effect. Second, they have observation error. The same product does not always give the same response. Third is the categories, observers use them slightly different. I am absolutely aware these are much more properties than I might estimate. After all, a typical consumer takes two or three products in a test, while I have many more parameters. Finally, the outer limits (categories 1 and 9) are placed a bit outward. This is because there is an aversion to using these categories.
num2scale <- function(x,scale) {
cut(x,c(-Inf,scale,Inf),1:(length(scale)+1))
}
pop.limits2ind.limits <- function(scale,sd=.5) {
newscale <- scale+rnorm(length(scale),sd=sd)
newscale[order(newscale)]
}

obs.score <- function(obs,pop.score,pop.limits,scale.sd=0.5,
sensitivity.sd=.1, precision.sd=1,
labels=LETTERS[1:length(pop.score)]) {
# individual sensitivity (multiplicative)
obs.errorfreeintensity <- center.scale +
(pop.score-center.scale)*rlnorm(1,sd=sensitivity.sd)
obs.errorfreeintensity <- obs.errorfreeintensity +
# individual observation error
obs.intensity <- obs.errorfreeintensity+
rnorm(length(pop.score),sd=precision.sd)
# individual cut offs between categories
obs.limits <- pop.limits2ind.limits(pop.limits)
obs.score <- num2scale(obs.intensity,obs.limits)
data.frame(obs=obs,
score = obs.score,
product=labels
)
}

panel.score <- function(n=100,pop.score,pop.limits,scale.sd=0.5,
sensitivity.sd=.1,precision.sd=1,
labels=LETTERS[1:length(pop.score)]) {
la <- lapply(1:n,function(x) {
obs.score(x,pop.score,pop.limits,scale.sd,sensitivity.sd,
precision.sd,labels=labels)
})
dc <- do.call(rbind,la)
dc\$obs <- factor(dc\$obs)
dc
}

pop.limits <- c(1,3:8,10)
prodmeans <- c(7,7,8)
scores <- panel.score(40,prodmeans,pop.limits)
scores\$numresponse <- as.numeric(levels(scores\$score))[scores\$score]

### Analysis – ANOVA

The first analysis method is ANOVA. Plain ANOVA. Not because I dislike variance components or such, but because this is what is done most commonly. On top of that, I cannot imagine somebody taking these data, pulling it through a mixed model, while forgetting it is not from a continuous scale.
library(multcomp)
Res.aov <- aov( numresponse ~ obs + product , data=scores)
anova(Res.aov)

Analysis of Variance Table

Response: numresponse
Df  Sum Sq Mean Sq F value    Pr(>F)
obs       39 239.300  6.1359  6.3686 2.321e-12 ***
product    2   9.517  4.7583  4.9388   0.00956 **
Residuals 78  75.150  0.9635

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1

mt <- model.tables(Res.aov,type=’means’,cterms=’product’)

data.frame(dimnames(mt\$tables\$product)[1],
Response=as.numeric(mt\$tables\$product),
LSD=cld(glht(Res.aov,linfct=mcp(product=’Tukey’))
)\$mcletters\$monospacedLetters
)
product Response LSD
A       A    6.725  a
B       B    6.850  a
C       C    7.375   b

### Analysis – cumulative link mixed model (clmm)

The alternative, is the other extreme. Do it as correct as available, using both a cumulative link and making observers random. This model is standard available in the ordinal package. This means two intermediate models are not shown. Both of these models lack the simplicity of ANOVA while being less correct than clmm, hence inferior to both clmm and ANOVA.
library(ordinal)
Res.clmm <- clmm(score  ~ product + (1|obs),data=scores)
summary(Res.clmm)

Cumulative Link Mixed Model fitted with the Laplace approximation

formula: score ~ product + (1 | obs)
data:    scores

logit flexible  120  -179.57 379.14 37(5046) 8.77e-06 Inf

Random effects:
Var Std.Dev
obs 8.425   2.903
Number of groups:  obs 40

Coefficients:
Estimate Std. Error z value Pr(>|z|)
productB   0.1309     0.4225   0.310  0.75680
productC   1.4640     0.4718   3.103  0.00192 **

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1

Threshold coefficients:
Estimate Std. Error z value
2|3  -6.0078         NA      NA
3|4  -5.6340         NA      NA
4|5  -4.1059     0.4528  -9.069
5|6  -3.0521     0.4769  -6.400
6|7  -1.0248     0.4970  -2.062
7|8   0.5499     0.5303   1.037
8|9   4.0583     0.7429   5.463

Res.clmm2 <- clmm(score ~ 1 + (1|obs),data=scores)

anova(Res.clmm2,Res.clmm)

Likelihood ratio tests of cumulative link models:

Res.clmm2 score ~ 1 + (1 | obs)       logit flexible
Res.clmm  score ~ product + (1 | obs) logit flexible

no.par    AIC  logLik LR.stat df Pr(>Chisq)
Res.clmm2      8 387.41 -185.70
Res.clmm      10 379.14 -179.57  12.266  2    0.00217 **

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1

co <- coef(Res.clmm)

vc <- vcov(Res.clmm)
names(co) <- levels(scores\$product)
sd <- sqrt(c(vc[1,1],diag(vc)[-1]+vc[1,1]-2*vc[1,-1]))
data.frame(
`top 2 box`=arm::invlogit(c(-co[1],-co[1]+co[-1])),
`2.5% limit`=arm::invlogit(c(-co[1],-co[1]+co[-1])+qnorm(.025)*sd),
`97.5% limit`=arm::invlogit(c(-co[1],-co[1]+co[-1])+qnorm(.975)*sd),
check.names=FALSE
)

top 2 box 2.5% limit 97.5% limit
A 0.3658831  0.1694873   0.6199713
B 0.3967401  0.1753347   0.6704337
C 0.7138311  0.4498042   0.8838691

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...