# Centering in Moderation Analysis: A Guide

**Stat's What It's All About**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This blog post will cover what centering is, what a sum-to-zero contrast is, why and when you should use them, and how you can do them in ** R**. Specifically, we will focus on their use and how they affect coefficient interpretation in regression tables and affect interpretation in ANOVA tables.

I will be using the following packages and versions:

library(datawizard) # 0.9.1 library(parameters) # 0.21.3

We will be using the following toy data set:

## Generate Toy Dataset

data <- data.frame( Y = c(21, 21, 22.8, 21.4, 18.7, 18.1, 14.3, 24.4, 22.8, 19.2), G = factor(c("g2", "g2", "g1", "g2", "g3", "g2", "g3", "g1", "g1", "g2")), X = c(160, 160, 108, 258, 360, 225, 360, 146.7, 140.8, 167.6), Q = c(3.9, 3.9, 3.85, 3.08, 3.15, 2.76, 3.21, 3.69, 3.92, 3.92), A = factor(c("a1", "a1", "a2", "a2", "a1", "a2", "a1", "a2", "a2", "a2")) )

(This is really a re-labeled version of `mtcars[1:10, c(1:3, 5, 8)]`

.)

str(data) #> 'data.frame': 10 obs. of 5 variables: #> $ Y: num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 #> $ G: Factor w/ 3 levels "g1","g2","g3": 2 2 1 2 3 2 3 1 1 2 #> $ X: num 160 160 108 258 360 ... #> $ Q: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 #> $ A: Factor w/ 2 levels "a1","a2": 1 1 2 2 1 2 1 2 2 2

# Simple Regression

We will start with not centering anything.

## Continuous Predictor

model1 <- lm(Y ~ X, data = data)

Something that is *always* true - the `(Intercept)`

represents the predicted values when all the predictors are set to 0.

model_parameters(model1) #> Parameter | Coefficient | SE | 95% CI | t(8) | p #> ---------------------------------------------------------------------- #> (Intercept) | 25.56 | 1.62 | [21.82, 29.31] | 15.74 | < .001 #> X | -0.02 | 7.20e-03 | [-0.04, -0.01] | -3.46 | 0.009 # Predict for X=0 predict(model1, newdata = data.frame(X = 0)) #> 1 #> 25.5638

But say we wanted the intercept to represent the predicted value when - how would we do that? **By centering!**

So what is centering?

Don’t just be a zero…

Centering is the processes of assigning *meaning* to the value of 0 of some variable. We do this by:

- defining a meaningful value
*Subtracting*that value from all values of the variable.

For example, I can center *X* so that 0 is 258.

rbind( X = data$X, "X - 258" = data$X - 258 ) #> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] #> X 160 160 108 258 360 225 360 146.7 140.8 167.6 #> X - 258 -98 -98 -150 0 102 -33 102 -111.3 -117.2 -90.4

We can see that by subtracting 258 from *X*, values that were above 258 are positive, values that were below 258 are negative, and the values that were 258 are now *0*. Thus, for this new variable, when we speak of *0*, we actually mean “258”.

model2 <- lm(Y ~ I(X - 258), data = data) model_parameters(model2) #> Parameter | Coefficient | SE | 95% CI | t(8) | p #> ---------------------------------------------------------------------- #> (Intercept) | 19.14 | 0.71 | [17.50, 20.78] | 26.86 | < .001 #> X - 258 | -0.02 | 7.20e-03 | [-0.04, -0.01] | -3.46 | 0.009 # Predict for X=258 predict(model1, newdata = data.frame(X = 258)) #> 1 #> 19.14033

Typically, centering is done around the mean (mean-centering) - thus giving *0* the meaning of “mean”.

model3 <- lm(Y ~ I(X - mean(X)), data = data) model_parameters(model3) #> Parameter | Coefficient | SE | 95% CI | t(8) | p #> ---------------------------------------------------------------------- #> (Intercept) | 20.37 | 0.62 | [18.95, 21.79] | 32.99 | < .001 #> X - mean(X) | -0.02 | 7.20e-03 | [-0.04, -0.01] | -3.46 | 0.009 # Predict for X=mean(X) predict(model1, newdata = data.frame(X = mean(data$X))) #> 1 #> 20.37

We can also do this with the `scale()`

function, but since this function also standardizes (re-scales to and *sd* of 1), we need to set `scale(scale = FALSE)`

:

model4 <- lm(Y ~ scale(X, scale = FALSE), data = data) model_parameters(model4) #> Parameter | Coefficient | SE | 95% CI | t(8) | p #> ---------------------------------------------------------------------- #> (Intercept) | 20.37 | 0.62 | [18.95, 21.79] | 32.99 | < .001 #> X | -0.02 | 7.20e-03 | [-0.04, -0.01] | -3.46 | 0.009 # Predict for X=mean(X) predict(model1, newdata = data.frame(X = mean(data$X))) #> 1 #> 20.37

Alternatively, we can just use `datawizard::center()`

:^{1}

model5 <- lm(Y ~ center(X), data = data) model_parameters(model5) #> Parameter | Coefficient | SE | 95% CI | t(8) | p #> ---------------------------------------------------------------------- #> (Intercept) | 20.37 | 0.62 | [18.95, 21.79] | 32.99 | < .001 #> center(X) | -0.02 | 7.20e-03 | [-0.04, -0.01] | -3.46 | 0.009 # Predict for X=mean(X) predict(model1, newdata = data.frame(X = mean(data$X))) #> 1 #> 20.37

Note that no matter how we centered *X*, its coefficient did not change - only the intercept changed!

## Categorical Predictor

When adding categorical predictors (in R we call these “factors”), R converts them to a set of variables ( being the number of levels in the factor) that represent a set of contrasts between the levels of the variable. For our `G`

factor that has 3 levels, we get two such variables: `G[g2]`

and `G[g3]`

.

model6 <- lm(Y ~ G, data = data) model_parameters(model6) #> Parameter | Coefficient | SE | 95% CI | t(7) | p #> ------------------------------------------------------------------- #> (Intercept) | 23.33 | 0.96 | [ 21.05, 25.61] | 24.21 | < .001 #> G [g2] | -3.19 | 1.22 | [ -6.08, -0.31] | -2.62 | 0.034 #> G [g3] | -6.83 | 1.52 | [-10.44, -3.23] | -4.49 | 0.003

By default, R uses treatment contrasts (`stats::contr.treatment()`

). These are built up by constructing each variable to be 0 for all but 1 level of the factor, with the 1st level being left out. We can see this by playing with `stats::contr.treatment()`

:

contr.treatment(levels(data$G)) #> g2 g3 #> g1 0 0 #> g2 1 0 #> g3 0 1

The “variable” (called a *dummy variable*) of `g2`

is only 1 for the level of `G=g2`

, the *dummy variable* of `g3`

is only 1 for the level of `G=g3`

, while `g1`

is left out - called the “reference level”.

Recall that the `(Intercept)`

always represents the predicted values when all the predictors are set to 0. In this case, setting both `g2`

and `g3`

to 0 is the same as representing `G=g1`

. And indeed, that is what the intercept represents:

model_parameters(model6) #> Parameter | Coefficient | SE | 95% CI | t(7) | p #> ------------------------------------------------------------------- #> (Intercept) | 23.33 | 0.96 | [ 21.05, 25.61] | 24.21 | < .001 #> G [g2] | -3.19 | 1.22 | [ -6.08, -0.31] | -2.62 | 0.034 #> G [g3] | -6.83 | 1.52 | [-10.44, -3.23] | -4.49 | 0.003 predict(model6, newdata = data.frame(G = "g1")) #> 1 #> 23.33333

### Interpreting Dummy Variables

The other variables now represent the differences between each level and the reference level. It is hard to see, but we can use some linear algebra to help us here:

(mm1 <- cbind( `(Intercept)` = 1, contr.treatment(levels(data$G)) )) #> (Intercept) g2 g3 #> g1 1 0 0 #> g2 1 1 0 #> g3 1 0 1 t(solve(mm1)) #> (Intercept) g2 g3 #> g1 1 -1 -1 #> g2 0 1 0 #> g3 0 0 1

Each column now holds the linear contrast weights represented by that coefficient. We can confirm this:

model_parameters(model6) #> Parameter | Coefficient | SE | 95% CI | t(7) | p #> ------------------------------------------------------------------- #> (Intercept) | 23.33 | 0.96 | [ 21.05, 25.61] | 24.21 | < .001 #> G [g2] | -3.19 | 1.22 | [ -6.08, -0.31] | -2.62 | 0.034 #> G [g3] | -6.83 | 1.52 | [-10.44, -3.23] | -4.49 | 0.003 (predicted_means <- predict(model6, newdata = data.frame(G = c("g1", "g2", "g3")))) #> 1 2 3 #> 23.33333 20.14000 16.50000 predicted_means[2] - predicted_means[1] #> 2 #> -3.193333 predicted_means[3] - predicted_means[1] #> 3 #> -6.833333

### Sum-to-Zero Contrasts

But say we wanted the intercept to represent the predicted value that is the grand mean - averaged across all levels of `G`

() - how would we do that? **By using sum-to-zero contrasts!**

Sum-to-zero contrasts, like treatment coding (a.k.a dummy coding) convert a factor to a set of contrasts between the levels of the variable. Unlike dummy coding, the weights of each contrast sum to 0.

There are many coding schemes that achieve this (Helmert contrasts, polynomial contrasts, …). The most popular one is effect coding (in R, confusingly built with `stats::contr.sum()`

).

contr.sum(levels(data$G)) #> [,1] [,2] #> g1 1 0 #> g2 0 1 #> g3 -1 -1

We can see that each contrast (column) sums to 0.

There are 3 ways of changing the coding scheme (contrast matrix) of a factor in R:

# 1. assign to the contrasts() function contrasts(data$G) <- contr.sum # 2. The C() function (BIG C!) data$G <- C(data$G, contr = contr.sum) # 3. In the model lm(Y ~ G, data = data, contrasts = list(G = contr.sum))

These should all produce the same results.

I will be using the 3rd option here for verbosity.

Let’s see how this affects our coefficients:

model7 <- lm(Y ~ G, data = data, contrasts = list(G = contr.sum)) model_parameters(model7) #> Parameter | Coefficient | SE | 95% CI | t(7) | p #> ------------------------------------------------------------------ #> (Intercept) | 19.99 | 0.57 | [18.65, 21.33] | 35.35 | < .001 #> G [1] | 3.34 | 0.79 | [ 1.47, 5.22] | 4.21 | 0.004 #> G [2] | 0.15 | 0.71 | [-1.53, 1.83] | 0.21 | 0.840 (predicted_means <- predict(model6, newdata = data.frame(G = c("g1", "g2", "g3")))) #> 1 2 3 #> 23.33333 20.14000 16.50000 mean(predicted_means) #> [1] 19.99111

Success - the intercept is equal to the grand mean!

What about the other variables? It is hard to see, but we can again use some linear algebra to help us here:

(mm2 <- cbind( `(Intercept)` = 1, contr.sum(levels(data$G)) )) #> (Intercept) #> g1 1 1 0 #> g2 1 0 1 #> g3 1 -1 -1 t(solve(mm2)) #> (Intercept) #> g1 0.3333333 0.6666667 -0.3333333 #> g2 0.3333333 -0.3333333 0.6666667 #> g3 0.3333333 -0.3333333 -0.3333333

Again, each column holds the linear contrast weights represented by that coefficient. It might not be obvious, but these are the deviations of each mean from the grand mean.

Again, we can confirm this:

model_parameters(model7) #> Parameter | Coefficient | SE | 95% CI | t(7) | p #> ------------------------------------------------------------------ #> (Intercept) | 19.99 | 0.57 | [18.65, 21.33] | 35.35 | < .001 #> G [1] | 3.34 | 0.79 | [ 1.47, 5.22] | 4.21 | 0.004 #> G [2] | 0.15 | 0.71 | [-1.53, 1.83] | 0.21 | 0.840 (predicted_means <- predict(model6, newdata = data.frame(G = c("g1", "g2", "g3")))) #> 1 2 3 #> 23.33333 20.14000 16.50000 predicted_means[1] - mean(predicted_means) #> 1 #> 3.342222 predicted_means[2] - mean(predicted_means) #> 2 #> 0.1488889

I personally like the interpretability of dummy-coding better - as differences from the reference level. Thankfully, we can preserve that interpretability while also having the intercept represent the grand mean, using *deviation contrasts* - another type of sum-to-zero contrasts, available with `datawizard::contr.deviation()`

.

contr.deviation(levels(data$G)) #> g2 g3 #> g1 -0.3333333 -0.3333333 #> g2 0.6666667 -0.3333333 #> g3 -0.3333333 0.6666667

model8 <- lm(Y ~ G, data = data, contrasts = list(G = contr.deviation)) model_parameters(model8) #> Parameter | Coefficient | SE | 95% CI | t(7) | p #> ------------------------------------------------------------------- #> (Intercept) | 19.99 | 0.57 | [ 18.65, 21.33] | 35.35 | < .001 #> G [g2] | -3.19 | 1.22 | [ -6.08, -0.31] | -2.62 | 0.034 #> G [g3] | -6.83 | 1.52 | [-10.44, -3.23] | -4.49 | 0.003 (predicted_means <- predict(model6, newdata = data.frame(G = c("g1", "g2", "g3")))) #> 1 2 3 #> 23.33333 20.14000 16.50000 mean(predicted_means) #> [1] 19.99111 predicted_means[2] - predicted_means[1] #> 2 #> -3.193333 predicted_means[3] - predicted_means[1] #> 3 #> -6.833333

# In Moderation Analysis

All this is great, but who cares about the intercept?

Well, this all comes into play much more substantially when moderators are involved!

When we have a moderator, the coefficient of the focal predictor represents a conditional effect. Specifically the effect of that predictor when the moderator is 0!

For two continuous predictors ( the focal predictor and the moderator), we can write out the regression equation as:

We can take out as a common factor:

In other words

Within this regression equation, is the “intercept” - the predicted slope of (the conditional slope) when !

If we condition on the *mean* of the moderator, this is often called the main effect, or the average effect.^{2}

Let’s see this in action.

## Continuous Predictor, Continuous Moderator

model9 <- lm(Y ~ X * Q, data = data) model_parameters(model9) #> Parameter | Coefficient | SE | 95% CI | t(6) | p #> --------------------------------------------------------------------- #> (Intercept) | 3.13 | 54.24 | [-129.58, 135.85] | 0.06 | 0.956 #> X | 0.10 | 0.27 | [ -0.57, 0.76] | 0.35 | 0.737 #> Q | 6.57 | 15.60 | [ -31.59, 44.73] | 0.42 | 0.688 #> X × Q | -0.04 | 0.08 | [ -0.24, 0.16] | -0.45 | 0.670

*X*’s coefficient is the conditional slope (a.k.a., the *simple slope*) of *X* when *Q* is 0:

(predicted_means <- predict(model9, newdata = data.frame(X = c(0, 1), Q = 0))) #> 1 2 #> 3.133346 3.228691 predicted_means[2] - predicted_means[1] #> 2 #> 0.09534522

When we center *Q*, it is the conditional slope (a.k.a., the *simple slope*) of *X* at the mean of *Q* - also called the “main effect”:

model10 <- lm(Y ~ X * center(Q), data = data) model_parameters(model10) #> Parameter | Coefficient | SE | 95% CI | t(6) | p #> ---------------------------------------------------------------------- #> (Intercept) | 26.39 | 2.90 | [ 19.30, 33.48] | 9.11 | < .001 #> X | -0.03 | 0.02 | [ -0.08, 0.02] | -1.64 | 0.151 #> center(Q) | 6.57 | 15.60 | [-31.59, 44.73] | 0.42 | 0.688 #> X × center(Q) | -0.04 | 0.08 | [ -0.24, 0.16] | -0.45 | 0.670 (predicted_means <- predict(model9, newdata = data.frame(X = c(0, 1), Q = mean(data$Q)))) #> 1 2 #> 26.38931 26.35590 predicted_means[2] - predicted_means[1] #> 2 #> -0.03341344

Note that the coefficient of `Q`

is not affected - it is the conditional slope of *Q* when *X* is 0. If we also wanted the main effect of *Q* we would need to also center *X*.

## Continuous Predictor, Categorical Moderator

We can observe this same idea when the moderator is a factor.

# default dummy coding model11 <- lm(Y ~ Q * A, data = data) model_parameters(model11) #> Parameter | Coefficient | SE | 95% CI | t(6) | p #> ------------------------------------------------------------------- #> (Intercept) | -2.40 | 11.37 | [-30.23, 25.43] | -0.21 | 0.840 #> Q | 5.97 | 3.20 | [ -1.85, 13.80] | 1.87 | 0.111 #> A [a2] | 14.76 | 13.58 | [-18.47, 47.99] | 1.09 | 0.319 #> Q × A [a2] | -3.40 | 3.81 | [-12.74, 5.93] | -0.89 | 0.406

*Q*’s coefficient is the conditional slope (a.k.a., the *simple slope*) of *Q* when the dummy variables (in this case just one) of *A* are set to 0 - when *A* is `a1`

:

(predicted_means <- predict(model11, newdata = data.frame(Q = c(0, 1), A = "a1"))) #> 1 2 #> -2.400173 3.574452 predicted_means[2] - predicted_means[1] #> 2 #> 5.974625

We can again use sum-to-zero contrasts to get the grand-average slope of *X* - the “main effect”:

model12 <- lm(Y ~ Q * A, data = data, contrasts = list(A = contr.sum)) model_parameters(model12) #> Parameter | Coefficient | SE | 95% CI | t(6) | p #> ------------------------------------------------------------------ #> (Intercept) | 4.98 | 6.79 | [-11.63, 21.59] | 0.73 | 0.491 #> Q | 4.27 | 1.91 | [ -0.39, 8.94] | 2.24 | 0.066 #> A [1] | -7.38 | 6.79 | [-23.99, 9.23] | -1.09 | 0.319 #> Q × A [1] | 1.70 | 1.91 | [ -2.96, 6.37] | 0.89 | 0.406 (predicted_means <- predict(model11, newdata = expand.grid(Q = c(0, 1), A = c("a1", "a2")))) #> 1 2 3 4 #> -2.400173 3.574452 12.358596 14.929210 mean(c(predicted_means[2] - predicted_means[1], predicted_means[4] - predicted_means[3])) #> [1] 4.272619

## Categorical Predictor, Continuous Moderator

This holds true when the focal predictor is a factor as well.

# default dummy coding model13 <- lm(Y ~ A * Q, data = data) model_parameters(model13) #> Parameter | Coefficient | SE | 95% CI | t(6) | p #> ------------------------------------------------------------------- #> (Intercept) | -2.40 | 11.37 | [-30.23, 25.43] | -0.21 | 0.840 #> A [a2] | 14.76 | 13.58 | [-18.47, 47.99] | 1.09 | 0.319 #> Q | 5.97 | 3.20 | [ -1.85, 13.80] | 1.87 | 0.111 #> A [a2] × Q | -3.40 | 3.81 | [-12.74, 5.93] | -0.89 | 0.406

*A*’s coefficient (`A [a2]`

) is the conditional difference (a.k.a., the *simple effect*) between the levels of *A* when *Q* is 0:

(predicted_means <- predict(model13, newdata = data.frame(A = c("a1", "a2"), Q = 0))) #> 1 2 #> -2.400173 12.358596 predicted_means[2] - predicted_means[1] #> 2 #> 14.75877

When we center *Q*, it is the conditional difference at the mean of *Q* - also called the “main effect”:

model14 <- lm(Y ~ A * center(Q), data = data) model_parameters(model14) #> Parameter | Coefficient | SE | 95% CI | t(6) | p #> -------------------------------------------------------------------------- #> (Intercept) | 18.74 | 1.15 | [ 15.92, 21.56] | 16.26 | < .001 #> A [a2] | 2.72 | 1.49 | [ -0.93, 6.36] | 1.82 | 0.118 #> center(Q) | 5.97 | 3.20 | [ -1.85, 13.80] | 1.87 | 0.111 #> A [a2] × center(Q) | -3.40 | 3.81 | [-12.74, 5.93] | -0.89 | 0.406 (predicted_means <- predict(model13, newdata = data.frame(A = c("a1", "a2"), Q = mean(data$Q)))) #> 1 2 #> 18.73805 21.45343 predicted_means[2] - predicted_means[1] #> 2 #> 2.715377

Note that in all cases, regardless of centering, the interaction coefficient was unchanged. This will always be true for the highest level interaction.

# In ANOVA

Finally, we have Categorical-by-Categorical moderation - more often seen in ANOVA tables. As you might expect, when looking at coefficient tables, centering affects the values and meaning of the coefficient. When looking at ANOVA tables things get more complicated, as it depends on the type of sum-of-squares computed (1, 2, or 3).

You can read more about those differences in my blog post *Everything You Always Wanted to Know About ANOVA*.

The TL;DR is - yes, you should center your predictors / use sum-to-zero contrasts.

# Summary

Centering makes coefficient tables (and ANOVA tables) more interpretable (and in Bayesian analysis can make prior elicitation easier) - **a very important goal to itself**, easing the interaction between the *numbers* and our squishy-human-flesh-brains - but that’s really all it does: it does not affect the model as a whole (doesn’t change or global multicollinearity).

It is quite common to see conditional slopes interpreted as main effects even when moderators have not been properly centered. Now at least *you* know better!

## Session Info

sessionInfo() #> R version 4.3.1 (2023-06-16 ucrt) #> Platform: x86_64-w64-mingw32/x64 (64-bit) #> Running under: Windows 11 x64 (build 22621) #> #> Matrix products: default #> #> #> locale: #> [1] LC_COLLATE=English_Israel.utf8 LC_CTYPE=English_Israel.utf8 #> [3] LC_MONETARY=English_Israel.utf8 LC_NUMERIC=C #> [5] LC_TIME=English_Israel.utf8 #> #> time zone: Asia/Jerusalem #> tzcode source: internal #> #> attached base packages: #> [1] stats graphics grDevices utils datasets methods base #> #> other attached packages: #> [1] parameters_0.21.3 datawizard_0.9.1 #> #> loaded via a namespace (and not attached): #> [1] cli_3.6.2 knitr_1.45 TH.data_1.1-2 #> [4] rlang_1.1.2 xfun_0.41 estimability_1.4.1 #> [7] xtable_1.8-4 jsonlite_1.8.8 zoo_1.8-12 #> [10] MSBMisc_0.0.1.14 htmltools_0.5.7 rmarkdown_2.25 #> [13] grid_4.3.1 evaluate_0.23 MASS_7.3-60 #> [16] fastmap_1.1.1 yaml_2.3.8 mvtnorm_1.2-4 #> [19] insight_0.19.7.4 compiler_4.3.1 multcomp_1.4-25 #> [22] codetools_0.2-19 sandwich_3.1-0 emmeans_1.9.0 #> [25] coda_0.19-4 htmlwidgets_1.6.4 rstudioapi_0.15.0 #> [28] lattice_0.21-8 digest_0.6.33 splines_4.3.1 #> [31] Matrix_1.6-4 tools_4.3.1 survival_3.5-5 #> [34] bayestestR_0.13.1.7

# References

**leave a comment**for the author, please follow the link and comment on their blog:

**Stat's What It's All About**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.