Microgenres exist because markets are fragmenting and marketers need names to attract emerging customer segments with increasingly specific preferences. The cost of producing and delivery music now supports a plenitude of joint pairings of recordings and customers. The coevolution of music and its audience binds together listening preferences and available alternatives.
I already know a good deal about your preferences by simply knowing that you listen to German Hip Hop or New Orleans Jazz (see the website Every Noise at Once). Those microgenres are not accidental but were named in order to broadcast the information that customers need in order to find what they want to buy and at the same time that artists require to market their goods. Matchmaking demands its own vocabulary. Over time, the language adapts and only the “fittest” categories survive.
R Makes It Easy
The R package NMF simplifies the analysis as I demonstrated in my post on Modeling Plenitude and Speciation. Unfortunately, the data in that post were limited to only 17 broad music categories, but the R code would have been the same had there been several hundred microgenres or several thousand songs.
The output is straightforward once you understand what nonnegative matrix factorization (NMF) is trying to accomplish. All matrix factorizations, as the name implies, attempt to identify “simpler” matrices or factors that will reproduce approximately the original data matrix when multiplied together. Simpler, in this case, means that we replace the many observed variables with a much smaller number of latent variables. The belief is that these latent variables will simultaneously account for both row cliques and column microgenres as they coevolve.
This is the matrix factorization diagram that I borrowed from Wikipedia to illustrate the process.
The original data matrix V holds the non-negative numbers indicating the listening intensities for every microgenre is approximated by multiplying a customer graded membership matrix W times a matrix of factor loadings for the observed variables H. The columns of W and the rows of H reflect the number of latent variables. It can be argued that V contains noise so that reproducing it exactly would be overfitting. Thus, nothing of “true” value is lost by replacing the original observations in V with the customer latent scores in W. As you can see, W has fewer columns, but I can always approximate V using the factor loadings in H as a “decoder” to reconstitute the observations without noise (a form of data compression).
Interpreting the Output
We start our interpretation of the output with H containing the factor loadings for the observed variables. For example, we might expect to see one of the latent variables reflecting interest in classical music with sizable factor loadings for the microgenres associated with such music. One of these microgenre could serve as an “anchor” that defines this latent feature if only the most ardent classical music fans listened to this specific microgenre (e.g., Baroque). Hard rock or jazz might be anchors for other latent variables. If we have been careful to select observed variables with high imagery (e.g., is it easy to describe the hard rock listener?), we will be able to name the latent variable based only on its anchors and other high loadings.
The respondent matrix W serves a similar function for customers. Each column is the same latent variable we just described in our interpretation of H. For every row, W gives us a set of graded membership indices telling us the degree to which each latent variable contributes to that respondent’s pattern of observed scores. The opera lover, for instance, would have a high membership index for the latent variable associated with opera. We would expect to find a similar pairing between the jazz enthusiasts and a jazz-anchored latent variable, assuming that this is the structure underlying the coevolving network of customer and product.
Once again, we need to remember that nonnegative matrix factorization attempts to reproduce the observed data matrix as closely as possible given the constraints placed on it by limiting the number of latent variables and requiring that all the entries be non-negative. The constraint on the number of latent variables should seem familiar to those who have some experience with principal component or factor analysis. However, unlike principal component analysis, NMF adds the restriction that all the elements of W and H must be non-negative. The result is a series of additive effects with latent variables defined as nonnegative linear combinations of observed variables (H) and respondents represented as convex combinations of those same latent variables (W). Moreover, latent variables can only add to the reproduced intensities as they attempt to approximate the original data matrix V. Lee and Seung explain this as “learning the parts of objects.”
Why NMF Works with Marketing Data
All this works quite nicely in fragmented markets where products evolve along with customers desires to generate separate communities. In such coevolving networks, the observed data matrix tends to be sparse and requires a simultaneous clustering of rows and columns. The structure of heterogeneity or individual differences is a mixture of categories and graded category memberships. Customers are free to check as many boxes as they wish and order as much as they like. As a result, respondents can be “pure types” with only one large entry in their row of W and all the other values near zero, or they can be “hybrids” with several substantial membership entries in their row.
Markets will continue to fragment with customers in control and the internet providing more and more customized products. We can see this throughout the entertainment industry (e.g., music, film, television, books, and gaming) but also in almost everything online or in retail warehouses. Increasingly, the analyst will be forced to confront the high-dimensional challenge of dealing with subspaces or local regions populated with non-overlapping customers and products. Block clustering of row and columns is the first step when the data matrix is sparse because these variables over here are relevant only for those respondent over there. Fortunately, R provides the link to several innovative methods for dealing with such case (see my post The Ecology of Data Matrices).