The recommendation system gives you a long list of alternatives, but the consumer clicks on only a handful: most appealing first, then the second best, and so on until they stop with all the remaining receiving the same rating as not interesting enough to learn more. As a result, we know the preference order for only the most preferred. Survey research may duplicate this process by providing many choices and asking only for your top three selections – your first, second and third picks. This post will demonstrate that by identifying clusters with different ranking schemes (mixtures of rankings), we can learn a good deal about consumer preferences across all the alternatives from observing only a limited ordering of the most desired (partial topK rankings).
However, we need to remain open to the possibility that our sample is not homogeneous but contains mixtures of varying ranking schemes. To be clear, the reason for focusing on the topK rankings is because we have included so many alternatives that no one person will be able to respond to all of them. For example, the individual is shown a screen filled with movies, songs, products or attributes and asked to pick the best of the list in order of preference. Awareness and familiarity will focus attention on some subset, but not the same subset for everyone. We should recall that NK of the possible options will not be selected and thus be given zeros. Consequently, with individuals in rows and alternatives as columns, no one should be surprised to discover that the data matrix has a blockcluster appearance (as in the R package with the same name).
To see how all this works in practice, we begin by generating complete ranking data using the simulISR( ) function from the R package Rankcluster. The above graphic, borrowed from Wikipedia, illustrates the Insertion Sort Ranking (ISR) process that Rankcluster employs to simulate rankings. We start with eight objects in random order and sort them one at a time in a series of paired comparisons. However, the simulation function from Rankcluster allows us to introduce heterogeneity by setting a dispersion parameter called pi. That is, we can generate a sample of individuals sharing a common ranking scheme, yet with somewhat different observed rankings from the addition of an error component.
As an example, everyone intends to move #7 to be between #6 and #8, but some proportion of the sample may make “mistakes” with that proportion controlled by pi. Of course, the error could represent an overlap in the values associated with #6 and #7 or # 7 and #8 so that sometimes one looks better and other times it seems the reverse (sensory discrimination). Regardless, we do not generate a set of duplicate rankings. Instead, we have a group of ranks distributed about a true rank. The details can be found in their technical paper.
You will need to install the Rankcluster and NMF packages in order to run the following R code.
# Rankcluster needed to simulate rankings
library(Rankcluster)
# 100 respondents with pi=0.90
# who rank 20 objects from 1 to 20
rank1<simulISR(100, 0.90, 1:20)
# 100 respondents with pi=0.90
# who rank 20 object in reverse order
rank2<simulISR(100, 0.90, 20:1)
# check the mean rankings
apply(rank1, 2, mean)
apply(rank2, 2, mean)
# row bind the two ranking schemes
rank<rbind(rank1,rank2)
# set ranks 6 to 20 to be 0s
top_rank<rank
top_rank[rank>5]<0
# reverse score so that the
# scores now represent intensity
focus<6top_rank
focus[focus==6]<0
# use R package NMF to uncover
# mixtures with different rankings
library(NMF)
fit<nmf(focus, 2, method="lee", nrun=20)
# the columns of h transposed
# represent the ranking schemes
h<coef(fit)
round(t(h))
# w contains the membership weights
w<basis(fit)
# hard clustering
type<max.col(w)
# validates the mixture model
table(type,c(rep(1,100),rep(2,100)))
We begin with the simulIST( ) function simulating two sets of 100 rankings each. The function takes three arguments: the number of rankings to be generated, the value of pi, and the rankings listed for each object. The sequence 1:20 in the first ranking scheme indicates that there will be 20 objects ordered from first to last. Similarly, the sequence 20:1 in the second ranking scheme inputs 20 objects ranked in reverse from last to first. We concatenate data produced by the two ranking schemes and set threequarters of the rankings to 0 as if only the top5 rankings were provided. Finally, the scale is reversed so that the nonnegative values suggest greater intensity with five as the highest score.
The R package NMF performs the nonnegative matrix factorization with the number of latent features set to two, the number of ranking schemes generating the data. I ask that you read an earlier post for the specifics of how to use the R package NMF to factor partial topK rankings. More generally though, we are inputting a sparse data matrix with zeros filling 75% of the space. We are trying to reproduce that data (labeled V in the diagram below) by multiplying two matrices. One has a row for every respondent (w in the R code), and the other has a column for every object that was ranked (h in the R code). What links these two matrices is the number of latent features, which in this case happens also to be two because we simulated and concatenated two ranking schemes.
Let us say that we placed 20 bottles of wine along a shelf so that the cheapest was in the first position on the left and the most expensive was last on the shelf on the far right. These are actual wines so that most would agree that the higher priced bottles tended to be of higher quality. Then, our two ranking schemes could be called “price sensitivity” and “demanding palette” (feel free to substitute less positive labels if you prefer). If one could only be Price Sensitive or Demanding Palette and nothing in between, then you would expect precisely 1 to 20 and 20 to 1 rankings for everyone in each segment, respectively, assuming perfect knowledge and execution. That is, some of our drinkers may be unaware that #16 received a higher rating than #17 or simply give it the wrong rank. This is encoded in our pi parameter (pi=0.90 in this simulation). Still, if I knew your group membership and the bottle’s position, I could predict your ranking with some degree of accuracy varying with pi.
Nonnegative matrix factorization (NMF) seeks to recover the latent features separating the wines and the latent feature membership for each drinker from the data matrix, which you recall does not contain complete rankings but only the partial topK. Since I did not set the seed, your results will be similar, but not identical, to the following decomposition.
Columns
h

Demanding Palette

Price Sensitivity

Rows
w

Demanding Palette

Price Sensitivity


C1

0

368

R1

0.00000

0.01317


C2

0

258

R2

0.00100

0.00881


C3

0

145

R3

0.00040

0.00980


C4

4

111

R4

0.00105

0.00541


C5

18

68

R5

0.00000

0.01322


C6

49

80

R6

0.00000

0.01207


C7

33

59

R7

0.00291

0.00541


C8

49

61

R8

0.00361

0.00416


C9

45

50

R9

0.00242

0.01001


C10

112

31

.

.

.


C11

81

30

.

.

.


C12

63

9

.

.

.


C13

79

25

R193

0.01256

0.00000


C14

67

18

R194

0.00366

0.00205


C15

65

28

R195

0.01001

0.00030


C16

79

28

R196

0.00980

0.00000


C17

85

14

R197

0.00711

0.00028


C18

93

5

R198

0.00928

0.00065


C19

215

0

R199

0.01087

0.00000


C20

376

0

R200

0.01043

0.00000

The 20 columns from transposed h are presented first, and then the first few rows followed by the last rows from w. These coefficients will reproduce the data matrix, which contains numbers from 0 to 5. For instance, the reproduced score for the first respondent for the first object is 0*0.00000 + 386*0.01317 = 4.84656 or almost 5, suggesting that they most prefer the cheapest wine. In a similar fashion, the last row, R200, gives greater weight to the first column, and the first column seems to prefer the higher end of the wine continuum.
Clearly, there are some discrepancies toward the middle of the wine rankings, yet the ends are anchored. This makes sense given that we have data only on the top5 rankings. Our knowledge of the ten objects in the middle comes solely from the misclassification when making pairwise comparisons set by pi=0.90. In the aggregate we seem to be able to see some differentiation even when we did not gather any individual data after the Kth position. Hopefully, C1 represents wine in a box and C20 is a famous vintage from an old village with a long wine history, making our interpretation of the latent features easier for us.
When I run this type of analysis with actual marketing data, I typically uncover many more latent features and find respondents with sizable membership weightings spread across several of those latent features. Preference for wine is based on more than a pricequality tradeoff, so we would expect to see other latent features accounting for the top5 rankings (e.g., the reds versus the whites). The likelihood that an object makes it into the top5 selection is a decreasing function of its rank order across the entire range of options so that we might anticipate some differentiate even when the measurement is as coarse as a partial ranking. NMF will discover that decomposition and reproduce the original rankings as I have shown with the above example. It seems that there is much we can learn for partial rankings.
Rbloggers.com offers daily email updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...