The new consumer is the old consumer with more options and fewer prohibitions. Douglas Holt calls it the postmodern market defined by differentiation: “consumer identities are being fragmented, proliferated, recombined, and turned into salable goods.” It is not simply that more choices are available for learning about products, for sharing information with others and for making purchases. All that is true thanks to the internet. In addition, however, we have seen what Grant McCracken names plenitude, “an ever-increasing variety of observable ways of living and being that are continually coming into existence.” Much more is available, and much more is acceptable.
For instance, the new digital consumer is no longer forced to choose one of the three major networks. Not only do they have other channels, but now they “watch” while connected through other devices. The family can get together in front of the TV with everyone doing their own thing. Shouldn’t such consumer empowerment have some impact on how we segment the market?
Although we believe that the market is becoming more fragmented, our segment solutions still look the same. In fact, the most common segmentation of the digital consumer remains lifestyle. Thus, Experian’s Fast Track Couple is defined by age and income with kids or likely to start a family soon. Of course, one’s life stage is important and empty nesters do not behave like unmarried youths. But where is the fragmentation? What digital devices are used when and where and for what purposes? Moreover, who else is involved? We get no answers, just more of the same. For example, IBM responds to increasing diversity with its two-dimensional map based on usage type and intensity with a segment in every quadrant.
The key is to return to our new digital consumer who is doing what they want with the resources available to them. Everything may be possible but the wanting and the means impose a structure. Everyone does not own every device, nor do they use every feature. Instead, we discover recurrent patterns of specific device usage at different occasions with a limited group of others. As we have seen, the new digital consumer may own a high-definition TV, an internet-connected computer or tablet, a smartphone, a handheld or gaming console, a DVD/Blu-Ray player or recorder, a digital-media receiver for streaming, and then there is music. These devices can be for individual use or shared with others, at home or somewhere else, one at a time or multitasking, for planned activities or spontaneously, every day or for special occasions, with an owned library or online content, and the list could go on.
What can we learn from usage intensity data across such an array of devices, occasions and contexts? After all, topic modeling and sentiment analysis can be done with a “bag of words” listing the frequencies with which words occur in a text. Both are generative models assuming that the writer or speaker have something they want to say and they pick the words to express it. If all I had was a count of which words were used, could I infer the topic or the sentiment? If all I had was a measure of usage intensity across devices, occasions and contexts, could I infer something about consumer segments that would help me design or upsell products and services?
Replacing Similarity as the Basis for Clustering
Similarity, often expressed as distance, dominates cluster analysis, either pairwise distances between observations or between each observation and the segment centroids. Clusters are groupings such that observations in the same cluster are more similar to each other than they are to observations in other clusters. A few separated clouds of points on a two-dimensional plane displays the concept. However, we need lots of dimensions to describe our new digital consumer, although any one individual is likely to be close to the origin of zero intensity on all but a small subset of the dimensions. Similarity or distance loses its appeal as the number of dimensions increase and the space becomes more sparse (the curse of dimensionality).
Borrowing from topic modeling, we can use non-negative matrix factorization (NMF) without ever needing to calculate similarity. What are the topics or thematic structures underlying the usage patterns of our new digital consumer? What about personal versus shared experiences? Would we not expect a different pattern of usage behavior for those wanting their own space and those wanting to bring people together? Similarly, those seeking the “ultimate experience” within their budgets might be those with the high quality speakers or the home theater or latest gaming console and newest games. The social networker multitasks and always keeps in contact. The collector builds their library. Some need to be mobile and have access while in transit. I could continue, but hopefully it is clear that one expects to see recurring patterns in the data.
NMF uncovers those pattern by decomposing the data matrix with individuals as the rows and usage intensities as the columns. As I have shown before and show again below, the data matrix V is factored into a set of latent features forming the rows of H and individual scores on those same latent features in the rows of W. We can see the handiwork of the latent features in the repeating pattern of usage intensities. Who does A, B, C, and D with such frequency? It must be a person of this type engaging in this kind of behavior.
You can make this easy by thinking of H as a set of factor loading for behaviors (turned on its side) and W as the corresponding individual factor scores. For example, it is reasonable to believe that at least some of our new digital consumers will be gamers, so we expect to see one row of H with high weights or loadings for all the game related behaviors in the columns of H. Say that row is the first row, then the first column of W tells us how much each consumers engages in gaming activities. The higher the score in the first column of W, the more intense the gamer. People who never game get a score of zero.
In the above figure there are only two latent features. We are trying to reproduce the data matrix with as many latent features as we can interpret. To be clear, we are not trying to reproduce all the data as closely as possible because some of that data will be noise. Still, if I look at the rows of H and can quickly visualize and name all the latent features, I am a happy data analyst and will retain them all.
The number of latent features will depend on the underlying data structure and the diversity of the intensity measures. I have reported 22 latent features for a 218 item adjective rating scale. NMF, unlike the singular value decomposition (SVD) associated with factor analysis, does not attempt to capture as much variation as possible. Instead, NMF identifies additive components, and consequently we tend to see something more like micro-genre or micro-segments.
So far, I have only identified the latent features. Sometimes that is sufficient, and individuals can be classified by looking at their row in W and classifying them as belonging to the latent feature with the largest score. But what if a few of our gamers also watched live sports on TV? It is helpful to recall that latent features are shared patterns so that we would not extract a separate latent feature for gaming and for live TV sports if everyone who did one did the other, in which case there would be only one latent feature with both sets of intensity measures loading on it in H.
The latent feature scores in W can be treated like any other derived score and can enter into any other analysis as data points. Thus, we can cluster the rows of W, now that we have reduced the dimensionality from the columns of V to the columns of W and similarity has become a more meaningful metric (though care must be taken if W is sparse). The heat maps produced by the NMF package attach a dendrogram at the side displaying the results of a hierarchical cluster analysis. Given that we have the individual latent feature scores, we are free to create a biplot or cluster with any method we choose.
R Makes It So Easy with the NMF Package
Much of what you know about k-means and factor analysis generalized to NMF. That is, like factor analysis one needs to specify the number of latent features (rank r) and interpret the factor loadings contained in H (after transposing or turning it sideways). You can find all the R code and the all the output explained in a previous post. As one has the scree plot in factor analysis, there are several such plots in NMF that some believe will help one solve the number of factors problem. The NMF vignette outlines the process under the heading “Estimating the factorization rank” in Section 2.6. Personally, I find such aids to be of limited value, relying instead on interpretability as the criteria for keeping or discarding latent features.
Finally, NMF runs into all the problem experienced using k-means, the most serious being local minima. Local minima are recognized when the solution seems odd or when you repeat the same analysis and get a very different solution. Similar to k-means, one can redo the analysis many times with different random starting values. If needed, one can specify the seeding method so that a different initialization starts the iterative process (see Section 2.3 of the vignette). Adjusting the number of different random starting values until consistent solutions are achieved seems to work quite well with marketing data that contain separable groupings of rows and columns. That is, factorization works best when there are actual factors generating the data matrix, in this case, types of consumers and kinds of activities that are distinguishable (e.g., some game and some do not, some only stream and others rent or own DVDs, some only shop online and others search online but buy locally).