Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Herbert Simon’s “ant on the beach” does not search for food in a straight line because the environment is not uniform with pebbles, pools and rough terrain. At least the ant’s decision making is confined to the 3-dimensional space defining the beach. Consumers, on the other hand, roam around a much higher dimensional space in their search for just the right product to buy.

Do you search online or shop retail? Do you go directly to the manufacturer’s website or do you seek out professional reviews or user ratings? Does YouTube or social media hold the key? Similar decisions must be made for physical searches of local retailers and superstores?  Of course, embedded within each of these decision points are more choices concerning features, servicing and price.

Yet, we do not observe all possible paths in the consumer purchase journey. Like the terrain of the beach, the marketplace makes some types of searches easier than others. In addition, like the ant, the first consumers leave trails that later consumers can follow. This can be direct word of mouth or indirect effects such as internet searches where the websites shown first depend on the number of previous visits. But it can also be marketing messaging and expert reviews, that is, markers along the trail telling us what to look for and where to look. We are social creatures, and it is fascinating to see how quickly all the possible paths through the product offerings are narrowed down to several well-worn trails that we all follow. Culture impacts what and how we buy, and statistical modeling that incorporates what others are doing may be our best hope of discovering those pathways.

In order to capture everyone in the product market and all possible sources of information, we require a wide net with fine webbing. Our data matrix will contain heterogeneous rows of consumers with distinctive needs who are seeking very different benefits. Moreover, our columns must be equally diverse to span everywhere that a consumer can search for product information. As a result, we can expect our data matrix to be sparse for we have included many more columns of information sources than any one consumer would access.

To make sense of such a data matrix, we will require a statistical model or algorithm that reflects this construction process, by which I mean the social and cultural grouping of consumers who share a common understanding of what is important to know and where one should seek such information. For example, someone looking for a new credit card could search and apply solely online, but not any consumer, for some do not shop on the internet or feel insecure without the presence of a physical building close to home. Those wanting to apply in-person may wait for a credit card offer to be inserted in their monthly bank statement or they may see an advertisement in the local newspaper.

Modeling the Joint Separation of Consumers and Their Information Sources

Nonnegative matrix factorization (NMF) decomposes the nonnegative data matrix into the product of two other nonnegative matrices, one for consumers and the other for information sources. The goal is dimension reduction. Before NMF, we needed all p columns of the data matrix to describe the consumer. Now, we can get by with only the r latent features, where r is much smaller than p. What are these latent features? They are defined in the same manner as the factors in factor analysis. Our second matrix from the nonnegative factorization contains coefficients that can be interpreted as one would factor loadings. We look for the information sources with the largest weights to name the latent feature.

Returning to our credit card example, the data matrix includes rows for consumers banking online and in-person plus columns for online search along with columns for direct mail and newspaper ads. Online banking customers use online information sources, while in-person banking customers can be found looking for information in a different cluster of columns. We have separation with online row and columns forming one block and in-person rows and columns coming together in a separate block.

The nonnegativity of the two product matrices enables such a “parts-based” representation with the simultaneous clustering of both rows and columns. We start with the observed data matrix. It is nonnegative so that zero indicates none and a larger positive value suggest more of whatever is being measured. Counts or frequencies of occurrence would work. Actually, the data matrix can contain any intensity measure. Hopefully, you can visualize that the data matrix will be more sparse (more zeros) with greater separation between the row-column blocks, and in turn, this sparsity will be associated with corresponding sparsity in the two product matrices.

A toy example might help with this explanation.

 V1 V2 V3 V4 S1 6 3 0 0 S2 4 2 0 0 S3 2 1 0 0 S4 0 0 6 3 S5 0 0 4 2 S6 0 0 2 1

The above data matrix shows the intensity of search scores from 0 (no search) to 6 (intense search) for six consumers across four different information sources. What might have produced such a pattern? The following could be responsible:
• Online sources in the first two columns with V1 more popular than V2,
• Offline sources in the last two columns with V3 more popular than V4,
• Online customers in the first three rows with individual search intensity S1 > S2 > S3, and
• Offline customers in the last three rows with individual search intensity S4 > S5 > S6.
The pattern might seem familiar as row and column effects from an analysis of variance. The columns form a two-level repeated measures factor with V1 and V2 nested in the first level (online) and V3 and V4 in the second level (offline). Similarly, the rows fall into two levels of a between-subject factor with the first three rows nested in level one (online) and the last three rows in level two (offline). Biclustering algorithms approach the problem in this manner (e.g., the R package biclust). Matrix factorization achieves a similar outcome by factoring the data matrix into the product of two new matrices with one representing row effects and the other column effects.

The NMF R package decomposes the data matrix into the two components that are believed to have generated the data in the first place. In fact, I created the data matrix as a matrix product and then use NMF to retrieve the generating matrices. The R code is given at the end of this post. The matrices W and H, below, reflect the above four bullet points. When these two matrices are multiplied, their product W x H is the above data matrix (e.g., the first entry in the data matrix is 3×2+0x0=6).

 W R1 R2 H V1 V2 V3 V4 S1 3 0 R1 2 1 0 0 S2 2 0 R2 0 0 2 1 S3 1 0 S4 0 3 S5 0 2 S6 0 1

As expected, when we run the nmf() function with rank r=2 on this data matrix, we get these two matrices back again with W as the basis and H as the coefficient matrix. Actually, because W and H are multiplied, we might find that every element in W is divided by 2 and every element in H is multiplied by 2, which would yield the same product. Looking at the weights in H, one concludes that R1 taps online information sources, leaving R2 as the offline latent feature. If you wished to standardize the weights, all the coefficients in a row could be transformed to range from 0 to 1 by dividing by the maximum value in that row.

Decompositions such as NMF are common in statistical modeling. Regression analysis in R using the lm() function is performed as a QR decomposition. The singular value decomposition (SVD) underlies much of principal component analysis. Nothing usual here, except for the ability of NMF to thrive when the data are sparse.

To be clear, sparsity is achieved when we ask about the details of consumer information search. Such details enable management to make precise changes in their marketing efforts. As important, detailed probes are more likely to retrieve episodic memories of specific experiences. It is better to ask about the details of price comparison (e.g., visit competitor website or side-by-side price comparison on Amazon or some similar site) than just inquire if they considered price during the purchase process.

Although we are not tracking ants, we have spread sensors out all over the beach, a wide network of fine mesh. Our beach, of course, is the high-dimensional space defined by all possible information sources. This space can be huge, over a billion combinations when we have only 30 information sources measured as yes or no. Still, as long as consumers confine their searches to low-dimensional subspaces, the data matrix will have the sparsity needed by the decompositional algorithm. That is, NMF will be successful as long as consumers adopt one of several established search pathways clearly marked by repeated consumer usage and marketing signage.

R code to create the V=WH data matrix and run the NMF package:

W=matrix(c(3,2,1,0,0,0,0,0,0,3,2,1), nrow=6)
H=matrix(c(2,0,1,0,0,2,0,1), nrow=2)
V=W%*%H
W; H; V

library(NMF)
fit<-nmf(V, 2, method="lee", nrun=20)
fit
round(basis(fit),3)
round(coef(fit))
round(basis(fit)%*%coef(fit))