Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R index is developed in interpreting signal detection data for human perception. In sensory research it is used to interpret ranking data. The value one gets out of an R-index calculation is interpreted as a confusion between samples tested. It has been popularized among others by Bi and O’Mahony.
The value of R-index ranges between 0% and 100%, with 50% representing equality.
The computation of the R index is not difficult and can be done using a few short functions.

Simulated data of ranking for 3 persons and 4 products is:

person prod rank
1       1    1    1
2       1    2    3
3       1    3    2
4       1    4    4
5       2    1    1
6       2    2    4
7       2    3    3
8       2    4    2
9       3    1    1
10      3    2    4
11      3    3    3
12      3    4    2

These data has as R indices:
prod1 prod2    Rindex
1     1     2 100.00000
2     1     3 100.00000
3     1     4 100.00000
4     2     3  11.11111
5     2     4  22.22222
6     3     4  44.44444

It is not obvious at first sight, but the R index is in all actuality not a continuous function. There are only so many permutations of rankings. To demonstrate this, we repeat a simulation 1000 times. With 25 simulated persons with each 4 products the frequency of R-indices is shown in the figure.

It is clear that R index exactly 50% is the most probable outcome. This represents that the products are all equal. It is also clear that some R-indices which are really close to 50 are much less probable. Finally, values under 20 and over 80 hardly occur. This is well know. Critical values of R-indices are given by the red and blue lines (Bi and O’Mahony 1995 respectively 2007, Journal of Sensory Studies)

R code for these results

# A simple Ranking
makeRanksNodiff <- function(nprod,nrep) {
inList <- lapply(1:nrep,function(x) sample(1:nprod,nprod)   )
data.frame(person=factor(rep(1:nrep,each=nprod)),
prod=factor(rep(1:nprod,times=nrep)),
rank=unlist(inList))
}

#  R index for two products out of a cross table
tab2Rindex <- function(t1,t2) {
Rindex <- crossprod(rev(t1)[-1],cumsum(rev(t2[-1]))) + 0.5*crossprod(t1,t2)
100*Rindex/(sum(t1)*sum(t2))
}

allRindex <- function(rankExperiment) {
crst <- xtabs(~ prod + rank,data=rankExperiment)
nprod <- nlevels(rankExperiment\$prod)
Rindices <- lapply(1:(nprod-1),function(p1) {
lap <- lapply((p1+1):nprod,function(p2) {
index <- tab2Rindex(crst[p1,],crst[p2,])
prod1 <- levels(rankExperiment\$prod)[p1]
prod2 <- levels(rankExperiment\$prod)[p2]
data.frame(prod1=prod1,prod2=prod2,Rindex=index)
})
return(do.call(rbind,lap))
})
do.call(rbind,Rindices)
}

# small data set
set.seed(12)
rankExperiment <- makeRanksNodiff(nprod=4,nrep=3)
rankExperiment
allRindex(rankExperiment)

# larger simulation
set.seed(12)
last <- lapply(1:1000,function(x) {
re <- makeRanksNodiff(nprod=4,nrep=25)
ar <- allRindex(re)
ar\$Rindex
} )

rankmatrix <- do.call(rbind,last)

RindicesTable <- as.data.frame(table(rankmatrix))
RindicesTable\$Rindex <- as.numeric(as.character(RindicesTable\$rankmatrix))

library(ggplot2)

g1 <- ggplot(RindicesTable, aes( Rindex, Freq))  + geom_point(alpha=.5)
g1 <- g1 + geom_vline(xintercept=50 + 18.57*c(-1,1),colour='red')
g1 <- g1 + geom_vline(xintercept=50 + 15.21*c(-1,1),colour='blue')
g1