In survey research, it makes a difference how the question is asked. “How would you rate the service you received at that restaurant?” is not the same as “Did you have to wait to be seated, to order your meal, to be served your food, or to pay your bill?” Questions about specific occurrences can be answered only by recollection, that is, by replaying the experience in our memory. On the other hand, more general evaluative questions, like those on most customer satisfaction surveys, require less effort and can be answered without memory for any of the details.
Dual-processing models of perception, memory and reasoning help explain how survey questions are answered. The “dual” refers to the two endpoints of what might well have additional levels of processing in-between. Using the terminology from fuzzy-trace theory, at one end is verbatim recollections of specific experiences. At the other end is the gist or the meaning extracted from the same experience. Both memory traces are formed in parallel, stored separately, and can be retrieved independently. Those of you who feel more comfortable with machine learning and pattern recognition will find a similar perspective in the work on scene understanding.
“Remembering the gist” simplifies our lives. We have already learned about restaurant types, either through direct experience or as part of the purchase process that got us to the restaurant in the first place. Zagat.com lists almost 300 different cuisines in its guides: pubs, sports bars, buffets, delis, coffee shops, cafes, bistros, fine dining, seafood, tapas, French, Italian, American, Chinese, steakhouses, salad bars, pizza, burgers, and much more. It is a rich and growing taxonomy that is shared among customers and used both to make purchase decisions and to remember consumption experiences.
How many restaurant types are there? Goal-directed categories are constructed as the need arises. For instance, when there is variation among fast food restaurants, we can add a “tag” to the fast food schema. Thus, McDonald’s adds the tag “for kids” and Carl’s Jr. does not. Subway adds “fresh” to its fast food label. However, some might find it more difficult to assimilate a restaurant like Panera Bread within the fast food schema. Do we need a fast causal restaurant type?
We simply reuse those knowledge structures as storage devices so that we are not required to retrieve all the details each time we need to form a judgment or make a choice. Consequently, when we fill in the satisfaction questionnaire, we are not reporting what we experienced but what we remember, and what we remember has been fit into the appropriate restaurant schema. Thus, although I can remember a considerable amount about my lunch yesterday, I never take the time or make the effort to “relive my restaurant experience” when I fill out a satisfaction questionnaire. Instead, I categorize the restaurant and remember an overall affect or feeling as my evaluative summary of the consumption experience. If asked for satisfaction ratings, I simply retrieve that evaluative affect and use the appropriate restaurant schema to complete the questionnaire. Of course, nothing forces the customer from “reliving” the restaurant experience. However, we see no evidence, either from self-reflection or think-aloud research, that respondents take the time or make the effort to recall specific memories when answering general satisfaction questions.
None of this would be an issue for statistical modeling, except that most product category schemata tend to generate ratings that fall along a single dimension representing the product’s strengths and weaknesses. Product schemata hold the expectations from the phrase “exceeding customer expectations.” In fact, like much stereotypical thinking and behavior, we may become aware of our product schema only when there is a violation of expectation. Customer satisfaction follows expectations in that the expected receives the higher ratings because this is what is usually delivered. One may like or one may not appreciate the all-you-can-eat buffet, which is reflected in overall higher or lower scores, but everyone rates “amount” higher than “quality.”
Customers are able to provide detailed verbatim recollections. Was the beef tender? Was the fish overcooked? Was the table and seating clean? Did the waiter or waitress return after the food was served to ask if you needed anything? Although there is interference in all recall, such questions at least provide the opportunity to collect somewhat independent information from each item. That is, we would not expect to find the same degree of multicollinearity that we see in most customer satisfaction data.
The gist, on the other hand, imposes associative coherence because it is the more automatic representation first accessed in judgment and decision making. The goal is not accuracy but a memory trace that can be used in future situations. If the food is great, the service is remembered as better than it would have been had the food not been good. Rudeness tends to be overlooked or forgotten, and false memories may be created. When the food is awful, on the other hand, all the small inconveniences and missteps are more likely to be amplified and thus bring all the ratings down. We may ask the respondent to rate their satisfaction with different components of the product or service, but what we get is a single dimension with the items ranked ordered according to the product schema. All the ratings are adjusted up or down so that the easiest to deliver still get the highest ratings, and the lowest ratings are reserved for the most difficult to provide.
Implications for Statistical Modeling in R: Item Response Theory
In a previous post, I showed how one might use the graded response model (GRM) from item response theory (IRT) as a model of satisfaction ratings. The R package ltm provides a comprehensive grm() function along with a complete set of plotting options. In my own research I have successfully fit the graded response model to satisfaction ratings many times over a full range of product categories. I repeatedly find what other market researchers report including a strong first principal component and a simplex pattern of decreasing correlation moving away from the principal diagonal. In addition, one sees the Guttman scale pattern in the R heatmaps that was illustrated in a previous post.
Borrowing the achievement testing analogy from the field where IRT was developed, we can say that satisfaction ratings are a test of a brand’s ability to deliver the benefits that their customers seek. High satisfaction ratings indicate features and services that are easier to deliver, while those aspects that are harder to provide receive lower scores. Ability to satisfy customers is the latent variable, and each product category has its own definition. The graded response model extracts that latent variable and locates both the items and the respondents along that same dimension. Thus, we learn both the relative strengths and weaknesses for each brand and where each respondent falls along the same scale.
I recognize that IRT modeling has not been the traditional approach when analyzing rating data. It is more common to see some type of factor or path analysis. For example, I used the omega function from the R package psych to estimate a bifactor model from the correlations among airline satisfaction ratings. To be clear, IRT and factor analysis of categorical item responses are two different parameterizations of the same statistical model. Now, you can understand why I spent so much time explaining the response generation process in the first section of this post. Data alone will not resolve the number of factors problem or rotational indeterminacy. However, if you disagree with my theoretical foundation for a one-dimensional representation, R provides many alternatives including a fine multidimensional IRT package (mirt) and a complete battery of structural modeling packages.
Finally, one could decide to travel down the other path of the dual-processing divide and ask only for recollection. However, you will need to get very detailed and can expect considerable missing data. I cannot ask if you had to wait to be seated when there is self-seating or take-out. Much of the recollection will be coded as “not applicable” (NA) for individual respondents. Moreover, we will be forced to replace our rating scales with behaviorally-anchored alternatives. Recollection requires that the respondent relive their experience. Rating scales, on the other hand, tend to pull the respondent out of the original experience and induce more abstract comparative thinking. Fortunately, we can turn to several R packages from machine learning for help analyzing such incomplete data, and IRT can assist with the scaling of categorical alternatives.
Summary and Conclusions
We begin with the recognition that the data we obtain from survey research is not a complete recording of events as experienced. Humans may well have memories of specific incidences that they can recall if the probe demands recollection. However, reliving verbatim memories takes some time and some effort so we do not rely on detailed recollection in everyday decision making. Instead, much of the time, we engage in a form of data compression that extracts only the information that we will need to make judgments and decisions quickly and with as little effort as possible. The gist compresses product and service interactions into a schema and an associate evaluative affect. It is a form of “chunking” that enables us to remember by imposing an organization on our experience. The affect determines avoidance and approach. The schema unpacks the compressed data.
The gist is a one-dimensional representation that is learned early in the purchase process as it is needed to understand the product category and make sense of all the different offerings. We will reuse this product schema over and over again to keep track of our experiences. We will reuse it to understand product reviews and advertising and word of mouth. And, we will reuse it to when asked to complete customer satisfaction surveys.
In the end, our model specification should match the response generation process. If our data are recollections of specific experiences, then we will require some type of incomplete matrix factorization to uncover the latent dimensions. However, when we ask for ratings at a more abstract level, we ought not be surprised if the resulting data are one-dimensional.