# Does It Make Sense to Segment Using Individual Estimates from a Hierarchical Bayes Choice Model?

March 24, 2013
By

(This article was first published on Engaging Market Research, and kindly contributed to R-bloggers)

I raise this question because we see calls for running segmentation with individual estimates from hierarchical Bayes choice models without any mention of the possible complications that might accompany such an approach.  Actually, all the calls seem to be from those using MaxDiff to analyze the data from incomplete block designs.  For example, if one were to run a search for "MaxDiff segmentation," you would find many in the marketing research community arguing that one should run a two-stage clustering starting with hierarchical Bayes estimation of individual-level MaxDiff parameters and then using those estimates as observations in some type of cluster analysis.  However, there is no reason to limit the discussion to MaxDiff, since the same issues that one might have concerning segmentation with MaxDiff apply to all hierarchical Bayes estimates.﻿

So, where's the problem?  There would be no question had we given each respondent a sufficient number of "choice sets" to estimate a complete set of parameters separately for each individual.  To refresh our memories, we are trying to measure the individual-level importance of an array of attributes or features (e.g., the attributes of a fast food restaurant).   Let us say that we had 16 attributes, and we wanted to form choice sets with four attributes in each block (e.g., clean bathrooms, reasonable prices, tastes good, short wait).  Each block is a choice set where the respondent would be shown the four attributes and asked which they wanted most and which they wanted least.  A balanced incomplete block design could be formed with 20 blocks.

I have written a function that uses the r package AlgDesign to generate an incomplete block design.  As you can see below, it is called as IBD(nalt, nperblock, nblock, niter) and takes four arguments:  number of total alternatives, number of alternatives per block, number of blocks, and number of iterations.  I have given a default value of 1000 for the number of iterations; however, you may need to increase that number depending on the complexity of your design (e.g., niter=5000).  The command IBD(16, 4, 20) will create an incomplete block design for 16 attributes, presented with four attributes per block, over 20 blocks.  You should note that the set.seed() function would be needed in order to obtain the same design each time and that you might need to try more than one set.seed() if the first does not generate a balanced design.

 library(AlgDesign)   IBD<-function(nalt, nperblock, nblock, niter=1000) {  BIB<-optBlock(~., withinData=factor(1:nalt), blocksizes=rep(nperblock,nblock), nRepeats=niter)  design<-matrix(BIB$rows,nrow=nblock, ncol=nperblock, byrow=TRUE) counts<-crossprod(table(c(rep(1:nblock, rep(nperblock,nblock))), BIB$rows))  list(design,counts) }   IBD(16,4,20)

> IBD(16,4,20)[[1]]      [,1] [,2] [,3] [,4] [1,]    6   12   13   14 [2,]    3    6   10   16 [3,]    1    3    9   13 [4,]    2    8   13   15 [5,]    4    6    8   11 [6,]    7   10   11   13 [7,]    8    9   14   16 [8,]    2    3    4   12 [9,]    7   12   15   16[10,]    1    8   10   12[11,]    3   11   14   15[12,]    5    9   11   12[13,]    4    5   13   16[14,]    2    6    7    9[15,]    3    5    7    8[16,]    2    5   10   14[17,]    1    5    6   15[18,]    1    2   11   16[19,]    1    4    7   14[20,]    4    9   10   15[[2]]         1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16  1  5 1 1 1 1 1 1 1 1  1  1  1  1  1  1  1  2  1 5 1 1 1 1 1 1 1  1  1  1  1  1  1  1  3  1 1 5 1 1 1 1 1 1  1  1  1  1  1  1  1  4  1 1 1 5 1 1 1 1 1  1  1  1  1  1  1  1  5  1 1 1 1 5 1 1 1 1  1  1  1  1  1  1  1  6  1 1 1 1 1 5 1 1 1  1  1  1  1  1  1  1  7  1 1 1 1 1 1 5 1 1  1  1  1  1  1  1  1  8  1 1 1 1 1 1 1 5 1  1  1  1  1  1  1  1  9  1 1 1 1 1 1 1 1 5  1  1  1  1  1  1  1  10 1 1 1 1 1 1 1 1 1  5  1  1  1  1  1  1  11 1 1 1 1 1 1 1 1 1  1  5  1  1  1  1  1  12 1 1 1 1 1 1 1 1 1  1  1  5  1  1  1  1  13 1 1 1 1 1 1 1 1 1  1  1  1  5  1  1  1  14 1 1 1 1 1 1 1 1 1  1  1  1  1  5  1  1  15 1 1 1 1 1 1 1 1 1  1  1  1  1  1  5  1  16 1 1 1 1 1 1 1 1 1  1  1  1  1  1  1  5

The first element [[1]] of the above list shows the attributes in each of the 20 blocks.  The second element [[2]] displays the number of times each of the 16 attributes occur together in the same block.  That is, all 16 attributes appear in five blocks (diagonal values), and each pair of attributes appears together in only one block (off-diagonal values).  I wrote this function in order to generate quickly incomplete block designs and check their "balance."   The idea came from the surveyanalysis.org website, which is full of interesting r code and well-written discussions of advanced statistical procedures.  You should try this IBD function with other well-known incomplete block designs, such as, IBD(6, 3, 10) and IBD(9, 5, 18).

Had every respondent answered all 20 blocks, we could provide MaxDiff counts for each respondent independent of all the other respondents.  To see how this works, suppose that the 16 attributes are in order so that the attribute labeled "1" is the most preferred and the attribute labeled "16" is the least preferred.  When we look at the 20 blocks in [[1]], "best" is always the attribute in the first column and "worst" is always the attribute in the last column.  It should be clear why we needed to ask both best and worst.   If we had not asked for the selection of the worst attribute from each choice set, the last eight attributes (numbered 9 - 16) would have been tied for last place.

Although this configuration does not perfectly rank order all 16 attributes, it comes very close to achieving its goal.  For example, the most preferred attribute #1 is selected five times, followed by attribute #2 that is selected 4 times, and so on.  Had we asked for second best or equivalently second worst, we would have reproduced our rankings of all the attributes.  Of course, we could have saved ourselves a lot of time and effort by simply asking for a rank ordering of all 16 attributes in the first place (i.e., select best of 16, best of remaining 15, etc.).

Nevertheless, 20 blocks might be somewhat demanding for a respondent to complete in a single interview, so instead, each respondent would be given only a handful of blocks; too few to obtain anything like the stable rank ordering of the attributes achieved with a set of 20 balanced blocks.  As a result, we need hierarchical Bayes estimation to "borrow strength" from the other respondents in order to provide individual-level estimates.  Explaining exactly how this is accomplished requires some time, but in the end we would discover that hierarchical Bayes needs to make some type of simplifying assumption concerning the distribution of attribute weights across all the respondents, most often, that each respondent comes from the same multivariate normal distribution of attribute weights.  That is, one way to "borrow" from the other respondents is to assume that all the respondents are alike in that they all belong to the same population of respondents with a common set of mean attribute values.

We are not claiming that every individual assigns the same value or weight to every attribute, but that the weights given by everyone in the group can be described by a multivariate normal distribution with a common vector of means.  If we let individual respondents be represented by the subscript h, then we might use the following notation to indicate that the group of individual attribute weights came from the same multivariate normal distribution:   βh ~ Normal(β , Σβ).  Clearly, this would not be the case if our respondents came from different segments.  Thus, in order to run the hierarchical Bayes we assume that all the respondents come from a common multivariate normal distribution, and then in order to run a segmentation, we claim that the respondents actually belong to different segments from different distributions.  Obviously, those advocating segmentation using the estimates from a hierarchical Bayes model seem to have forgotten the underlying assumptions that they made in order to obtain the estimates in the first place.

Even more troubling, the approach is self-defeating.  Individual estimates from a hierarchical Bayes are pooled or weighted averages of individual and group effects.  We call this Bayesian shrinkage.  It might help to think of it as a form of regression to the mean where predictions for extremes scores are "pulled" toward the regression line or moving mean.  If segmentation was your ultimate objective, you would be better served by incorporating multiple groups in your hierarchical model.  But this would not be a two-stage clustering strategy.

In a previous post, I showed how to use the rhierMnlRwMixture function from the R package bayesm.  The "Mixture" in the function name refers to the treatment of heterogeneity in the hierarchical model.  In the prior R code specifying the function for that example, the number of components used in the normal mixture was set to one (ncomp=1), meaning that we were assuming a single multivariate normal distribution.  Had we believed that there were segments, we could have set ncomp equal to the number of segments.  The book, Bayesian Statistics and Marketing, provides a formal overview of this approach.  Google Books provides online much of the relevant text covering "mixtures of normals" from Chapters 3.9 and 5.5.