**Psychological Statistics**, and kindly contributed to R-bloggers)

Neuroskeptic has just blogged on a new paper by Judd, Westfall and Kenny on *Treating stimuli as a random factor in social psychology: A new and comprehensive solution to a pervasive but largely ignored problem*. I can’t access the original paper (which is supposed to be available via my University but hasn’t appeared yet …) but I know a little bit about the topic and thought I’d write a few words.

*The language-as-fixed-effect fallacy*), but was originally raised by Coleman (1964). Clark noted that running separate ANOVAs treating subjects as unit of analysis and items as unit of analysis did not solve the problem (by-subject and by-item analyses). Either analysis is statistically non-significant the effect fails to generalize, but if both are statistically significant the correct analysis (that combines variability across subjects and items) might still be statistically non-significant. His solution was to estimate the correct ANOVA test statistic (quasi

*F*or

*F*‘) with a simple-to-calculate minimum value (min

*F*‘). This is known to be conservative (i.e., produces

*p*values that are slightly too large) but not unreasonably so in practice (see Raaijmakers et al., 1999). Raaijmakers et al. (1999) show that until recently most psycholinguistic researchers still got it wrong (e.g., by reporting separate by-item and by-subject analyses).

*X*is always the case). Alternatively, it might be reasonable to assume that the stimuli are – for the purposes of the study – very similar to others in the population (i.e., that population variability is negligible). This might be the case for certain mass-produced products (e.g., brands of chocolate bar) or precision-engineered equipment. However, a lot of the time you do want to generalize beyond your sample of stimuli …

*F*ratio of a conventional ANOVA would be correct. The principle here is quite simple:

*all relevant sources of variability need to be represented in the analysis*. By varying the stimuli between participants the variability is present and ends up being incorporated into the between-subjects error term.* This is quite a neat method and can be easy to set up in some studies (e.g., if you have a

*very*large pool of words to sample from by computer). Raaijmakers et al. (1999) also notes that you get the correct

*F*ratios from certain other designs. This, in my view, is only partly true. Any design that restricts the population sampled from (of participants or stimuli) restricts its variability and therefore restricts its generalizability to the pool of participants or stimuli being sampled from.

*language-as-fixed-effect fallacy*or more properly

*stimuli-as-fixed-effect fallacy*back to prominence. In principle it is possible to use a multilevel (or linear mixed) model to deal with the problem of multiple random effects (and this has all sorts of other advantages). However, the usual default model is a nested model that implicitly assumes that stimuli presented to each person are different.

*F*tests etc. Thus Clark’s assertion about using a design with stimuli nested within participants producing the correct F ratios is confirmed.

*lme4*package in R is particularly useful because it fits these models fairly effortlessly.

* Note that a by-item analysis or by-subject analysis violates this principle because the each analysis uses the average response (averaged over the levels of the other random factor) and the variability around this average is unavailable to the analysis.

** UPDATE: Jake Westfall kindly sent me a copy of the paper. I have not read it properly yet but looks extremely good. He points out that recent versions of SPSS can run cross-classified models (I’m still on an older version). Their paper includes SPSS, R and SAS code. I would still recommend R over SPSS. One highlight is that show how to compute the Kenward-Roger approximation in R. Complex multilevel models make it difficult to assess the correct *df* for effects and the Kenward-Roger approximation is one of the better solutions. In my book I used parametric boostrapping or HPD intervals to get round this problem, but this is potentially a very useful addition.

*References*

*Journal of Memory & Language, 59*, 390-412.

*Journal of Verbal Learning and Verbal Behavior, 12*, 335-359.

*Psychological Reports, 14*, 219-226.

*Journal of Memory & Language, 41*, 416-426.

**leave a comment**for the author, please follow the link and comment on their blog:

**Psychological Statistics**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...