We already know how to interpret this heatmap because we have had practice with the coefficients. These colors indicate the values of the mixing weights for each respondent. Thus, in the middle of the heatmap you can find a dark red rectangle for latent feature #3, which we have already determined to represent country/bluegrass. These individuals give the lowest possible rating to everything except for the genre loading on this latent feature. We do not observe that much yellow or lighter colors in this heatmap because less than 13% of the responses fell into the lowest box labeled “dislike very much.” However, most of the lighter regions are where you might expect them to be, for example, heavy metal/rap (#2), although we do uncover a heavy metal segment at the bottom of the figure.
Measuring Attraction and Ignoring Repulsion
We often think of liking as a bipolar scale, although what determines attraction can be different from what drives repulsion. Music is one of those product categories where satisfiers and dissatisfiers tend to be different. Negative responses can become extreme so that preference is defined by what one dislikes rather than what one likes. In fact, it is being forced to listen to music that we do not like that may be responsible for the lowest scores (e.g., being dragged to the opera or loud music from a nearby car). So, what would we find if we collapsed the bottom three categories and measured only attraction on a 3-point scale with 0=neutral, dislike or dislike very much, 1=like, and 2=like very much?
NMF thrives on sparsity, so increasing the number of zeros in the data matrix does not stress the computational algorithm. Indeed, the latent features become more separated as we can see in the coefficient heatmap. Gospel stands alone as its own latent feature. Country and bluegrass remain, as does opera/classical, blues/jazz, and rock. When we “remove” dislike for heavy metal and rap, heavy metal moves into rock and rap floats with reggae between jazz and rock. The same is true for folk and easy mood music, only now both are attractive to country and classical listeners.
More importantly, we can now interpret the mixture weights for individual respondents as additive attractors so that the first few rows are the those with interest in all the musical genre. In addition, we can easily identify listeners with specific interests. As we continue to work our way down the heatmap, we find jazz/blues(#4), followed by rock(#5) and a combination of jazz and rock. Continuing, we see country(#2) plus rock and country alone, after which is a variety of gospel (#1) plus some other genre. We end with opera and classical music, by itself and in combination with jazz.
Comparison with the Cultural Omnivore Hypothesis
As mentioned earlier, we can compare our findings to a published study testing whether inclusiveness rules tastes in music (the eclectic omnivore) or whether cultural distinctions between highbrow and lowbrow still dominate. Interestingly, the cluster analysis is approached as a graph-partitioning problem where the affinity matrix is defined as similarity in the score pattern regardless of mean level. All do not agree with this calculation, and we have a pair of dueling R packages using different definitions of similarity (the RCA vs. the CCA).
None of this is news for those of us who perform cluster analysis using the affinity propagation R package apcluster, which enables several different similarity measures including correlations (signed and unsigned). If you wish to learn more, I would suggest starting with the Orange County R User webinar for apcluster. The quality and breadth of the documentation will flatten your learning curve.
Both of the dueling R packages argue that preference similarity ought to be defined by the highs and lows in the score profiles ignoring the mean ratings for different individuals. This is a problem for marketing since consumers who do not like anything ought to be treated differently from consumers who like everything. One is a prime target and the other is probably not much of a user at all.
Actually, if I were interesting in testing the cultural omnivore hypothesis, I would be better served by collecting familiarity data on a broader range of more specific music genre, perhaps not as detailed as the above map but more revealing than the current broad categories. The earliest signs of preference can be seen in what draws our attention. Recognition tends to be a less obtrusive measure than preference, and we can learn a great deal knowing who visits each region in the music genre map and how long they stayed.
NMF identifies a sizable audience who are familiar with the same subset of music genre. These are the latent features, the building blocks as we have seen in the coefficient heatmaps. The lowbrow and the highbrow each confine themselves to separate latent features, residing in gated communities within the music genre map and knowing little of the other’s world. The omnivore travels freely across these borders. Such class distinctions may be even more established in the cosmetics product category (e.g., women’s makeup). Replacing genre with brand, you can read how this was handled in a prior post using NMF to analyze brand involvement.
R code to perform all the analyses reported in this post
# keep only the 17 genre used
# in the AMJ Paper (see post)
# calculate number of missing values for each
# respondent and keep only those with no more
# than 6 missing values
# run frequency tables for all the variables
# recode missing to the middle of the 5-point scale
# reverse the scale so that larger values are
# associated with more liking and zero is
# the lowest value
# longer names are easier to interpret
fit<-nmf(prefer, 5, "lee", nrun=30)
# recode bottom three boxes to zero
# and rerun NMF
# need to remove respondents with all zeros
fit<-nmf(prefer2, 5, "lee", nrun=30)