What is R-index

March 2, 2012
By

(This article was first published on Wiekvoet, and kindly contributed to R-bloggers)

R index is developed in interpreting signal detection data for human perception. In sensory research it is used to interpret ranking data. The value one gets out of an R-index calculation is interpreted as a confusion between samples tested. It has been popularized among others by Bi and O’Mahony.
The value of R-index ranges between 0% and 100%, with 50% representing equality.
The computation of the R index is not difficult and can be done using a few short functions.

Simulated data of ranking for 3 persons and 4 products is:

person prod rank
1       1    1    1
2       1    2    3
3       1    3    2
4       1    4    4
5       2    1    1
6       2    2    4
7       2    3    3
8       2    4    2
9       3    1    1
10      3    2    4
11      3    3    3
12      3    4    2

These data has as R indices:
prod1 prod2    Rindex
1     1     2 100.00000
2     1     3 100.00000
3     1     4 100.00000
4     2     3  11.11111
5     2     4  22.22222
6     3     4  44.44444

It is not obvious at first sight, but the R index is in all actuality not a continuous function. There are only so many permutations of rankings. To demonstrate this, we repeat a simulation 1000 times. With 25 simulated persons with each 4 products the frequency of R-indices is shown in the figure.

It is clear that R index exactly 50% is the most probable outcome. This represents that the products are all equal. It is also clear that some R-indices which are really close to 50 are much less probable. Finally, values under 20 and over 80 hardly occur. This is well know. Critical values of R-indices are given by the red and blue lines (Bi and O’Mahony 1995 respectively 2007, Journal of Sensory Studies)

R code for these results

# A simple Ranking
makeRanksNodiff <- function(nprod,nrep) {
inList <- lapply(1:nrep,function(x) sample(1:nprod,nprod)   )
data.frame(person=factor(rep(1:nrep,each=nprod)),
prod=factor(rep(1:nprod,times=nrep)),
rank=unlist(inList))
}

#  R index for two products out of a cross table
tab2Rindex <- function(t1,t2) {
Rindex <- crossprod(rev(t1)[-1],cumsum(rev(t2[-1]))) + 0.5*crossprod(t1,t2)
100*Rindex/(sum(t1)*sum(t2))
}

# R index for all products out of an experiment
allRindex <- function(rankExperiment) {
crst <- xtabs(~ prod + rank,data=rankExperiment)
nprod <- nlevels(rankExperiment\$prod)
Rindices <- lapply(1:(nprod-1),function(p1) {
lap <- lapply((p1+1):nprod,function(p2) {
index <- tab2Rindex(crst[p1,],crst[p2,])
prod1 <- levels(rankExperiment\$prod)[p1]
prod2 <- levels(rankExperiment\$prod)[p2]
data.frame(prod1=prod1,prod2=prod2,Rindex=index)
})
return(do.call(rbind,lap))
})
do.call(rbind,Rindices)
}

# small data set
set.seed(12)
rankExperiment <- makeRanksNodiff(nprod=4,nrep=3)
rankExperiment
allRindex(rankExperiment)

# larger simulation
set.seed(12)
last <- lapply(1:1000,function(x) {
re <- makeRanksNodiff(nprod=4,nrep=25)
ar <- allRindex(re)
ar\$Rindex
} )

rankmatrix <- do.call(rbind,last)

RindicesTable <- as.data.frame(table(rankmatrix))
RindicesTable\$Rindex <- as.numeric(as.character(RindicesTable\$rankmatrix))

library(ggplot2)

g1 <- ggplot(RindicesTable, aes( Rindex, Freq))  + geom_point(alpha=.5)
g1 <- g1 + geom_vline(xintercept=50 + 18.57*c(-1,1),colour=’red’)
g1 <- g1 + geom_vline(xintercept=50 + 15.21*c(-1,1),colour=’blue’)
g1

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...