Currently preparing a presentation on analyzing influential data in mixed effects models myself, my eye fell on an article in which important claims on racial prejudice were refuted. An important aspect of the criticism on existing work, is that in one article the main correlation was completely due to a single observation. Solely based on this single observation, the study’s outcomes showed the Implicit Association Test (IAT) to predict overall interaction quality between White or Black people. Removing that single observation (out of 41) from the data removed the complete effect.
With survey research showing declines in “American’s endorsement of prejudice sentiments” (p.568), the question rose whether such declines actually took place, or that they are an artifact of social desirability determining respondents’ responses to survey questions. Naturally, tests like the Implicit Association Test (IAT) gained considerable attention, for the attractive claim of such tests is to be able to show levels of prejudice that people themselves are unaware of and which do not show when asked about explicitly (e.g. in a survey).
Blanton et al. (2009) decided to test several of the articles on which the strong claims for the predictive validity of the IAT were based. They re-analyzed the (partial) data of two articles. In one of these analyses, on which 2001 article was based, it was found that one of the main findings was that high scores on the IAT were associated with worse interaction quality with Black experimenters, compared with White experimenters. I’m not completely sure what this interaction quality entails, but based on the re-study I would say that it is a combination of aspects such as ‘forward leaning’, ‘facing the experimenter’, ‘expressiveness’, ‘smiling’ and making ‘eye contact’.
How can just a single observation dominate the outcomes of a statistical analysis? Unfortunately, the answer to this question is: quite easily, especially when the analysis is based on a small number of observations. In this case, the refuted correlation was between the participants’ IAT score and the way the participants’ interacted with either Black of White experimenters. While it is known that the IAT score is determined by the participant’s age, one single participant had an exceptionally high age compared to the overall test group, and indeed that participant score very high on the IAT. Also, the quality of the interaction of that participant with the Black experimenter was rated very low. Now the thing is, that in the rest of the observations no association between IAT score and the quality of the interaction was to be observed, this single observation with extreme scores on both variables, completely dominated the outcomes of the study on this aspect. Deletion of this single observation brought the significant correlation of .32 down to non-significance: no association could be inferred between participants’ IAT score and how they interacted with the Black experimenter.
Blanton et al. didn’t write a full article based on just this point, and of course this is not the only criticism on the original article. Other aspects include low timing of the measures (respondents were probably aware of being tested for discriminatory behavior, before the ‘actual’ test took place), inter-rater reliability, improper statistical analysis due to recoding of the data (due to which the coding of a single rater influenced the findings). Nevertheless, from my current point of view, I’m especially interested in the bias caused by the influential observation.
Of course, McConnell et al. (2009), whose work was criticized, were given the opportunity to respond. Regarding the influence exerted by the outlier, they respond with two arguments. First they state that Blanton et al. did not study the correct outlier, for although this outlier did have an extreme IAT score, another participant did have an even higher score. Their second reaction states that Blanton et al. did focus on only one of the outcome measures, and not on all the various measure used in the original study. In their response, they show that deletion of the outlier found by Blanton et al. does not influence the outcomes of the analyses on the other outcome measures.
I find this response curious on two accounts. First, influential data and outliers are not two of a kind. McConnell’s response that the wrong outlier was selected is not necessarily true, for just having an extreme score on one variable is not enough to make an observation influential. Generally, it has to be an outlier, and to have leverage (changing the slope of the regression line). If the other outlier (mentioned by McConnell) did have a more extreme score on the IAT variable, but an average score on the behavior-quality variable, it may very well prove not to (overly) influence the outcomes of the study. Secondly, an observation is only influential relative to the specification of the analysis and the variables used in it. So, simple deletion of this single observation to show that it does not influence other analyses (on other outcome measures), it not much of a defense to the initial argument that the observation influenced the outcomes of a specific analysis.
All in all, an interesting debate. There is much more to it in both the articles by Blanton et al. (2009) and McConnell et al. (2009). But still, I find it especially striking to see how careful one should be when analyzing data and making inferences on it. And, of course, I can add a nice example of the impact of influential data to my collection.
Blanton, H., Jaccard, J., Klick, J., Mellers, B., Mitchell, G., & Tetlock, P. (2009). Strong claims and weak evidence: Reassessing the predictive validity of the IAT. Journal of Applied Psychology, 94 (3), 567-582 DOI: 10.1037/a0014665
McConnell, A., & Leibold, J. (2009). Weak criticisms and selective evidence: Reply to Blanton et al. (2009). Journal of Applied Psychology, 94 (3), 583-589 DOI: 10.1037/a0014649