# Plotting model fits

August 29, 2012
By

(This article was first published on Minding the Brain, and kindly contributed to R-bloggers)

We all know that it is important to plot your data and explore the data visually to make sure you understand it. The same is true for your model fits. First, you want to make sure that the model is fitting the data relatively well, without any substantial systematic deviations. This is often evaluated by plotting residual errors, but I like to start with plotting the actual model fit.

Second, and this is particularly important when using orthogonal polynomials, you want to make sure that the statistically significant effects in the model truly correspond to the “interesting” (i.e., meaningful) effects in your data. For example, if your model had a significant effects on higher-order terms like the cubic and quartic, you might want to conclude that this corresponds to a difference between early and late competition. Plotting the model fits with and without that term can help confirm that interpretation.

The first step to plotting model fits is getting those model-predicted values. If you use lmer, these values are stored in the eta slot of the model object. It can be extracted using [email protected], where m is the model object. Let's look at an example based on eye-tracking data from Kalenine, Mirman, Middleton, & Buxbaum (2012).

summary(data.ex)
##       Time           fixS           cond     obj          subj
## Min. : 500 Min. :0.0000 Late :765 T:510 21 :102
## 1st Qu.: 700 1st Qu.:0.0625 Early:765 C:510 24 :102
## Median : 900 Median :0.1333 U:510 25 :102
## Mean : 900 Mean :0.2278 27 :102
## 3rd Qu.:1100 3rd Qu.:0.3113 28 :102
## Max. :1300 Max. :1.0000 40 :102
## (Other):918
## timeBin ot1 ot2 ot3
## Min. : 1 Min. :-0.396 Min. :-0.2726 Min. :-0.450
## 1st Qu.: 5 1st Qu.:-0.198 1st Qu.:-0.2272 1st Qu.:-0.209
## Median : 9 Median : 0.000 Median :-0.0909 Median : 0.000
## Mean : 9 Mean : 0.000 Mean : 0.0000 Mean : 0.000
## 3rd Qu.:13 3rd Qu.: 0.198 3rd Qu.: 0.1363 3rd Qu.: 0.209
## Max. :17 Max. : 0.396 Max. : 0.4543 Max. : 0.450
##
## ot4
## Min. :-0.3009
## 1st Qu.:-0.1852
## Median :-0.0231
## Mean : 0.0000
## 3rd Qu.: 0.2392
## Max. : 0.4012
##
ggplot(data.ex, aes(Time, fixS, color = obj)) + facet_wrap(~cond) +
stat_summary(fun.y = mean, geom = "line", size = 2)

I've renamed the conditions "Late" and "Early" based on the timing of their competition effect: looking at fixation proportions for the related Competitor (green lines) relative to the Unrelated distractor, it looks like the “Late” condition had a later competition effect than the “Early” condition. We start by fitting the full model and plotting the model fit. For convenience, we'll make a new data frame that has the modeled observed data and the model fit:

library(lme4)
m.ex <- lmer(fixS ~ (ot1 + ot2 + ot3 + ot4) * obj * cond + (1 + ot1 + ot2 + ot3 + ot4 | subj) + (1 + ot1 + ot2 | subj:obj:cond), data = subset(data.ex, obj != "T"), REML = F)
data.ex.fits <- data.frame(subset(data.ex, obj != "T"), GCA_Full = [email protected])
ggplot(data.ex.fits, aes(Time, fixS, color = obj)) + facet_wrap(~cond) + stat_summary(fun.data = mean_se, geom = "pointrange", size = 1) + stat_summary(aes(y = GCA_Full), fun.y = mean, geom = "line", size = 2) + labs(x = "Time Since Word Onset (ms)", y = "Fixation Proportion")

The fit looks pretty good and the model seems to capture the early-vs.-late competition difference, so now we can use the normal approximation to get p-values for the object-by-condition interaction:

coefs.ex <- as.data.frame(summary(m.ex)@coefs)
coefs.ex\$p <- format.pval(2 * (1 - pnorm(abs(coefs.ex[, "t value"]))))
coefs.ex[grep("*objU:cond*", rownames(coefs.ex), value = T), ]
##                     Estimate Std. Error t value       p
## objU:condEarly -0.004164 0.01701 -0.2448 0.80664
## ot1:objU:condEarly 0.065878 0.07586 0.8685 0.38514
## ot2:objU:condEarly -0.047568 0.04184 -1.1370 0.25554
## ot3:objU:condEarly -0.156184 0.02327 -6.7119 1.9e-11
## ot4:objU:condEarly 0.075709 0.02327 3.2535 0.00114

There are significant object-by-condition interaction effects on the cubic and quartic terms, so that's where competition in the two conditions differed, but does that correspond to the early-vs.-late difference? To answer this question we can fit a model that does not have those cubic and quartic terms and visually compare it to the full model. We'll plot the data with pointrange, the full model with thick lines, and the smaller model with thinner lines.

m.exSub <- lmer(fixS ~ (ot1 + ot2 + ot3 + ot4) * obj + (ot1 + ot2 + ot3 + ot4) * cond + (ot1 + ot2) * obj * cond + (1 + ot1 + ot2 + ot3 + ot4 | subj) + (1 + ot1 + ot2 | subj:obj:cond), data = subset(data.ex, obj != "T"), REML = F)
data.ex.fits\$GCA_Sub <- [email protected]
ggplot(data.ex.fits, aes(Time, fixS, color = obj)) + facet_wrap(~cond) + stat_summary(fun.data = mean_se, geom = "pointrange", size = 1) + stat_summary(aes(y = GCA_Full), fun.y = mean, geom = "line", size = 2) + stat_summary(aes(y = GCA_Sub), fun.y = mean, geom = "line", size = 1) + labs(x = "Time Since Word Onset (ms)", y = "Fixation Proportion")

Well, it sort of looks like the thinner lines have less early-late difference, but it is hard to see. It will be easier if we look directly at the competition effect size (that is, the difference between the competitor and unrelated fixation curves):

ES <- ddply(data.ex.fits, .(subj, Time, cond), summarize, Competition = fixS[obj == "C"] - fixS[obj == "U"], GCA_Full = GCA_Full[obj == "C"] - GCA_Full[obj == "U"], GCA_Sub = GCA_Sub[obj == "C"] - GCA_Sub[obj == "U"])
ES <- rename(ES, c(cond = "Condition"))
ggplot(ES, aes(Time, Competition, color = Condition)) + stat_summary(fun.y = mean, geom = "point", size = 4) + stat_summary(aes(y = GCA_Full), fun.y = mean, geom = "line", size = 2) + labs(x = "Time Since Word Onset (ms)", y = "Competition") + stat_summary(aes(y = GCA_Sub), fun.y = mean, geom = "line", size = 1)

Now we can clearly see that the full model (thick lines) captures the early-vs.-late difference, but when we remove the cubic and quartic terms (thinner lines), that difference almost completely disappears. So that shows that those higher-order terms really were capturing the timing of the competition effect.

P.S.: For those that care about behind-the-scenes/under-the-hood things, this post was created (mostly) using knitr in RStudio.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...