My datasets tend to have random factors. I try to stick to general models whenever I can to avoid dealing with both random factors and complex error distributions (not always possible). I am compiling some notes here to avoid visiting the same R-forums every time I work in a new project. Those are some notes to self, but may be they are useful to someone else. Note that those are oversimplified notes, they assume you know your stats and your R and only want a cheat sheet. Also, they may contain errors/discrepancies with your philosophy (please notify me if you find errors, so I can update it!)

I’ll designate:
Response variable –this is what we want to model as y.
Predictors –This is what we think is affecting the response variable as xz when continuos, and AB when discrete with levels denoted by A1A2, etc…

Models abbreviated as m.

Let’s start assuming normal error distributions:

##     m <- lm(y ~ x)

then check residuals. You want to see no pattern in the following plot:

    scatter.smooth(residuals(m)~fitted(m))

and a straight line here

    qqplot(residuals(m))

    qqline(residuals(m)) 

more diagnostics here: http://www.statmethods.net/stats/rdiagnostics.html

## 2) Let’s say you have also a Covariate z

    m <- lm(y ~ z + x)

    summary(m)

    coef(m)

    confint(m)

If you want the anova table

    anova(m)

But the covariable should be added first because order matters. The result will be the variance explained by x after accounting for the variance due to A (that’s call a Type 1 error)

## 3) Otherwise what you actually have is not a covariable, but two predictors:

    m <- lm(y ~ z + x)

    summary(m) #same here

    library(car)

    Anova(m, Type = "II")

If there is indeed no interaction, then type II is statistically more powerful than type III

If the interaction is significant type III is more appropriate

    m <- lm(y ~ z * x)

    summary(m)

    library(car)

    Anova(m, Type = "III")

note: when your design is classic balanced ANOVA design, it doesn’t matter as all types gives you the same answer. More info here: http://mcfromnz.wordpress.com/2011/03/02/anova-type-iiiiii-ss-explained/

To plot interactions:

    interaction.plot(A, B, y)     #only works with A and B factors, so it rarely works for me

Or customize it by discretizing one of the variables (e.g. cut(z)) and fit lines for the two/three levels of z using the coefficients

## 4) How? By understanding the output, tricky specially when covariate is a factor.

    m <- lm(y ~ A * x)

    summary(m)

• (Intercept) Value is the intercept for A1 (v1)
• A2 value (v2) is the difference in the intercept between A1 and A2 (the intercept for A2 is calculated as v1-v2)
• x Value is the slope for x given A1 (v3)
• x:A2 Value (v4) is the difference in the slope for x between A1 and A2 (the slope of x for A2 is calculated as v3-v4)

## 5) Select models (by AIC or compare two models anova I know confusing name)

    AIC(m)

    library(MuMIn)

    AICc(m) #for small sample size

    anova(m1,m2)

But see below with random factors

## 6) Until now we wanted to see if there were differences among levels of A.

A included as fixed factor -> “the expectation of the distribution of effects”

    lm(x ~ A + z)

But if we want to see if there are general trends regardless of A, but we want to control for the variability due to A. –> “average effect in the population”, we use:

#### a) random intercept model (we expect the trend to be equal, but some plants will differ on the intercept)

    library(nlme)

    lme(y ~ x, random = ~1 | A)

#### b) random slope model (we expect the trend to be dependent on species identity)

    lme(y ~ x, random = ~1 + x | A)

In addition, now we can extract the random effects

    m$coefficients fixed and random factors, so we can see the random intercepts (and slopes) as well as  intervals(m) ## 7) To compare models in nlme With different radom structures you should use method="REML" as an option. To compare models with different fix effects structure using anova use ML, but give final p-values with REML ## 8) With complex models you can drege them to find the best ones  library(MuMIn)  m_set = dredge(m_full)  (top_m = get.models(m_set, subset = delta<3)) and compute averages the regular way without shrinkage:  model.avg(top_m)$avg.model

See also names and str(model.avg(top_m))

Or estimates with shrinkage (zero method) – this makes the most difference for the less important predictors since they will get averaged with a bunch of zeros, probably. pmodel.avg(top_m)\$coef.shrinkage

## 9) But what if residuals are bad?

Don’t panic yet. Some times is because variance is different across groups/values, you can investigate that by:

    boxplot(A, residuals(m, type = "norm"))

    plot(A, residuals(m, type = "norm"))

and fix it by incuding a variance factor in the models weights=varIdent(form=~1|A) (for factors) or weights= varPower(form=~x) for continous variables (you can also try VarExp or varConstPower)

And to check if it worked:

    plot(residuals(m, type = "normalized") ~ fitted(m))

The ref is Cleasby IR, Nakagawa S. 2011. Neglected biological patterns in the residuals. Behavioral Ecology and Sociobiology, 65(12), 2361-2372.

note that lm can not use the weights option, but I think, but you can use gls, instead.

## 10) Is also recomended to center scale(x, scale = FALSE) or scale scale(x) predictors.

Reference: Schielzeth, H. (2010), Simple means to improve the interpretability of regression coefficients. Methods in Ecology and Evolution, 1: 103–113.

## 11) You should also worry about co-linearity

    library(car)

    vif(m)

Should be lower than 2.5, and you should be very concerned when is over 10

or for lme’s load this little function from here (https://github.com/aufrank/R-hacks/blob/master/mer-utils.R)

    vif.mer <- function (fit) { ## adapted from rms::vif v <- vcov(fit) nam <- names(fixef(fit)) ## exclude intercepts ns <- sum(1 * (nam == "Intercept" | nam == "(Intercept)")) if (ns > 0) { v <- v[-(1:ns), -(1:ns), drop = FALSE] nam <- nam[-(1:ns)] } d <- diag(v)^0.5 v <- diag(solve(v/(d %o% d))) names(v) <- nam v }

    vif.mer(m)

## 12) and you may want to get R2 for glm also…

Based in Nakagawa, S., Schielzeth, H. (2013), A general and simple method for obtaining R2 from generalized linear mixed-effects models. Methods in Ecology and Evolution, 4: 133–142.

    library(MuMIn)

    r.squaredGLMM(m)

——————————————————————————————————

More details and plotting here: https://www.zoology.ubc.ca/~schluter/R/fit-model/

For GLMM’s (generalized) there is a whole new set of tricky things, maybe will be another gist.  