Adding features to a product can be costly, so brands have an incentive to include only those features most likely to increase demand. In the last two posts (first link and second link), I have recommended what could be called a “features stress test” that included both a data collection procedure and some suggestions for how to analyze that data.
Although the proposed analysis will work with any rating scale, one should consider replacing the traditional importance measure with behaviorally-anchored categories. That is, we discontinue the importance ratings with its hard-to-know-what-is-meant-by endpoints of very important and very unimportant, and we substitute a sequence of increasingly demanding actions that require consumers to decide how much they are willing to do or sacrifice in order to learn about or obtain a product with the additional feature. For example, as outlined in the first link, respondents choose 1=not interested, 2=nice to have, 3=tie-breaker, or 4=pay more for (an ordinal scale suitable for scaling with item response theory). The modifier “stress” suggests that the possible actions can be made more and more extreme until all but the most desired features fail to pass the test (e.g., “would pay considerably more for” instead of “pay more for”). The resulting data enables us to compare the impact of different features across consumers given that the same feature prioritization holds for everyone.
To be clear, our focus is on the features, and consumers are simply the measuring instrument. What is the impact of Feature A on purchase interest? We ask Consumer #1, and then Consumer #2 and so on. Since every consumer rates every feature, we can allow our consumers to hold varying standards of comparison, as long as all the features rise or fall together. Thus, it does not concern us if some of our consumers are more involved in the product category and report uniformly greater interest in all the features. Interestingly, although our focus was the feature, we learn something about consumer heterogeneity, which could be useful in future targeting.
Mixture of Customer Types with Different Feature Priorities
Our problem, of course, is that feature impact depends on consumer needs and desires. We cannot simply assume that there is one common feature prioritization that is shared by all. We may not wish to go so far as to suggest a unique feature customization for every customer, but certainly we are likely to find a few segments wanting different feature configurations. As a result, my sample of respondents is not uniform but consists of a mixture or composite of two or more customer types with different feature priorities.
The last two posts with links given above have provided some detail outlining why I believe that rating scales are ordinal and why polytomous item response theory (IRT) provides useful models of the response generation process. I have tried in those posts to provide a more gentle introduction into finite mixtures of IRT models, encouraging a more exploratory two-step process of k-means on person-centered scores followed by graded response modeling for each cluster uncovered.
The claim made by a mixture model is that every respondent belongs to a latent class with its own feature prioritization. Yet, we observe only the feature ratings. However, as I showed in the last post, those ratings contain enough information to identify the respondent’s latent class and the profile of feature impact for each of the latent classes. Now for the caveat, the ratings contain sufficient information only if we assume that our data are generated as a finite mixture of some small number of unidimensional polytomous IRT models. Fortunately, we know quite a bit about consumer judgment and decision making so that we have some justification for our assumptions other than that the models seem to fit.
The R package mixRasch Does It Simultaneously
Yes, R can recover the latent class as well as provide person and item estimates with one function called mixRasch (the same name is used for both the function and the package). If my ratings were a binary yes/no or agree/disagree, I would have many more R packages available for the analysis (see Section 2.4 for an overview of mixture IRT in R).
The mixRasch() function is straightforward. You tell it the data, the maximum number of iterations, the number of steps or threshold, the IRT model, and the number of latent classes:
The R code to generate the data mixing two groups of respondents with different feature priorities can be found in the last post. The appendix at the end of this post lists the additional R code needed to run mixRasch. The number of steps or thresholds is one less than the number of categories. We will be using the partial credit model (PCM), which behaves in practice much like the graded response model, although the category contrasts are not the same and there is that constant slope common to Rasch models. Of course, there is a lot more to be concerned about when using mixRasch and joint maximum likelihood estimation, and perhaps there will be time in a later post to discuss all that can go wrong. For now, we will look at the output to discover if we have learned anything about different types of feature prioritization and varying levels of intensity with which consumers want those features.
My example uses nine undefined features divided into three sets of three features each. The first three features have little appeal to anyone in the sample. Consumer heterogeneity is confined to the last six features. The 200 respondents belong to one of two groups: the first 100 whose preferences follow the ranking of the features from one to nine and the second 100 who preferred most the middle three features. The details can be found in that much-referenced previous post.
Although I deliberately wanted to keep the example abstract, you can personalize it with any product category of your choice. For example, banking customers can be split into online and in-branch types. The in-branch customer wants nearby branches with lots of helpful personnel. The online customer wants free bill-paying and mobile apps. Both types vary in their usage intensity and their product involvement, so that we expect to see differences within each type reflected in the size of the average rating across all the features. If you don’t like the banking example, you can substitute gaming or dining or sports or kitchen appliances or just about any product category.
The output from the mixRasch function is a long list with elements needing to be extracted. First, we want to know the latent class membership for our 200 respondents. It is not all-or-none but a probability of class membership summing to one. For example, the first respondent has a 0.80 likelihood of belonging to the first latent class and a 0.20 probability of being from the second latent class. This information can be found in the list element $class for every respondent except those giving all the features either the highest or lowest scores (e.g., our respondent in row 155 rated every feature with a four and row 159 contained all ones). If we use the maximum probability to classify respondents into mutually exclusive latent classes, the mixRasch function correctly identifies 84% of the respondents (we only know this because we randomly simulated the data). I should mention that the classification from the mixture Rasch model is not identical to the row-centered k-means from the last post, but there is 92% agreement for this particular example.
Finally, were we successful at recovering the feature prioritizations used to simulate the ratings? In the table below, the D1 column contains the difficulty parameters for the 100 respondents in the first segment. The adjacent column LC1 shows the recovered parameter estimates from the 94 respondent in the first latent class. Similar results are shown for the second segment and latent class in the D2 and LC2 columns. As you may recall, two respondents giving all ones or all fours could not be classified by the mixRasch function.
What have we learned from this and the last two posts?
A single screening question will tell me if I should include you in my survey of the wine market. Determining if you are a wine enthusiast will require many more questions, and it is likely that you will need to match a pattern of responses before classification is final. Yet, typing alone will not be adequate since systematic variation reminds after your classification as a wine enthusiast. It’s a matter of degree as one moves from novice to expert, from refrigerator to cellar, and from tasting to wine club host. Our clusters are no longer spherical or elliptical clumps or even regions but elongated networks of ever increasing commitment. As noted in one of my first posts on Archetypal Analysis, the admonition that “one size does not fit all” can be applied to both the need for segmentation and the segmentation process itself. Customer heterogeneity may be more complex than can be represented by either a latent class or a latent trait alone.
The post was titled “Latent Variable Mixture Models” in an attempt to accurately describe the approach being advanced. The book Advances in Latent Variable Mixture Models was published in 2007, so clearly my title is not original. In addition, a paper with the same name from nursing research provides a readable introduction (e.g., depression is identified by a symptom pattern but differs in intensity from mild to severe). Much of this work uses Mplus instead of R. However, we relied on the R package mixRasch in this post, and R has flexmix, psychomix, mixtools and more that all run some form of mixture modeling. Pursuing this topic would take us some time. So, I am including these references more as a postscript because I wanted to place this post in a broader context without having to explain that broader context.
Appendix with R Code
In order to create the data in ratings, you will need to return to the last post and run portions of the R code listed at the end of that post.
# need to set the seed only if
# we want the same result each
# time we run mixRasch
steps=3, model="PCM", n.c=2)
# shows the list structure
# containing the output
# latent cluster membership
# probility and max classification
# comparison with simulated data
# comparison with row-centered
# kmeans from last post