# Custom contrasts in emmeans

**Very statisticious on Very statisticious**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Following up on a previous post, where I demonstrated the basic usage of package **emmeans** for doing post hoc comparisons, here I’ll demonstrate how to make custom comparisons (aka *contrasts*). These are comparisons that aren’t encompassed by the built-in functions in the package.

Remember that you can explore the available built-in **emmeans** functions for doing comparisons via `?"contrast-methods"`

.

## Table of Contents

- Reasons for custom comparisons
- R packages
- The dataset and model
- Treatment vs control comparisons
- Building custom contrasts
- The contrast() function for custom comparisons
- Using named lists for better output
- Using “at” for simple comparisons
- Multiple custom contrasts at once
- More complicated custom contrasts
- Just the code, please

# Reasons for custom comparisons

There are a variety of reasons you might need custom comparisons instead of some of the standard, built-in ones. One common scenario that I see a lot is when we have a single control group for multiple factors, so the factors aren’t perfectly crossed. This comes up, e.g., when doing experiments that involve applying different substances (like fertilizers) at varying rates. One factor is the different substances applied and the other is different application rates. However, the control is applying nothing or water or something like that. There aren’t different rates of the control to apply, so there is a single control group for both factors.

Rather than trying to fit a model with multiple factors, focusing on main effects and the interaction, such data can be analyzed with a *simple effects* model. This is where the two (or more) factors of interest have been combined into a single factor for analysis. Such an analysis focuses on the effect of the two factors combined. We can use post hoc comparisons to estimate the overall effects of individual factors.

# R packages

I will load **magrittr** for the pipe in addition to **emmeans**.

library(emmeans) # v. 1.3.3 library(magrittr) # v. 1.5

# The dataset and model

I’ve made a small dataset to use as an example. The response variable is `resp`

and the two factors of interest have been combined into a single factor `sub.rate`

that has 5 levels: `A.1`

, `A.2`

, `B.1`

, `B.2`

, and `control`

.

One factor, which I’m thinking of as the *substance* factor, is represented by `A`

and `B`

(and the control). The second, the *rate* factor, is represented by `1`

and `2`

.

dat = structure(list(sub.rate = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L), .Label = c("A.1", "A.2", "B.1", "B.2", "control"), class = "factor"), resp = c(5.5, 4.9, 6.1, 3.6, 6.1, 3.5, 3, 4.1, 5, 4.6, 7.3, 5.6, 4.8, 7.2, 6.2, 4.3, 6.6, 6.5, 5.5, 7.1, 5.4, 6.7, 6.8, 8.5, 6.1)), row.names = c(NA, -25L), class = "data.frame") str(dat) # 'data.frame': 25 obs. of 2 variables: # $ sub.rate: Factor w/ 5 levels "A.1","A.2","B.1",..: 1 1 1 1 1 2 2 2 2 2 ... # $ resp : num 5.5 4.9 6.1 3.6 6.1 3.5 3 4.1 5 4.6 ...

I will use a simple, linear model for analysis.

fit1 = lm(resp ~ sub.rate, data = dat)

# Treatment vs control comparisons

The simple effects model makes it easy to get comparisons for each factor combination vs the control group with `emmeans()`

. I’ll use `trt.vs.ctrlk`

to do this since the control is the last level of the factor.

emmeans(fit1, specs = trt.vs.ctrlk ~ sub.rate) # $emmeans # sub.rate emmean SE df lower.CL upper.CL # A.1 5.24 0.466 20 4.27 6.21 # A.2 4.04 0.466 20 3.07 5.01 # B.1 6.22 0.466 20 5.25 7.19 # B.2 6.00 0.466 20 5.03 6.97 # control 6.70 0.466 20 5.73 7.67 # # Confidence level used: 0.95 # # $contrasts # contrast estimate SE df t.ratio p.value # A.1 - control -1.46 0.66 20 -2.214 0.1230 # A.2 - control -2.66 0.66 20 -4.033 0.0024 # B.1 - control -0.48 0.66 20 -0.728 0.8403 # B.2 - control -0.70 0.66 20 -1.061 0.6548 # # P value adjustment: dunnettx method for 4 tests

We may also be interested in some other comparisons, though. In particular, we might want to do some overall comparisons across the two factors. We will need custom contrasts for this.

# Building custom contrasts

Custom contrasts are based on the estimated marginal means output from `emmeans()`

. The first step to building custom contrasts is to calculate the estimated marginal means so we have them to work with. I will name this output `emm1`

.

emm1 = emmeans(fit1, specs = ~ sub.rate) emm1 # sub.rate emmean SE df lower.CL upper.CL # A.1 5.24 0.466 20 4.27 6.21 # A.2 4.04 0.466 20 3.07 5.01 # B.1 6.22 0.466 20 5.25 7.19 # B.2 6.00 0.466 20 5.03 6.97 # control 6.70 0.466 20 5.73 7.67 # # Confidence level used: 0.95

I’m going to start with a relatively simple example. I will compare mean `resp`

of the `A.2`

group to the `B.2`

group via custom contrasts.

Building a custom contrast involves pulling out specific group means of interest from the `emmeans()`

output. We *pull out* a group mean by making a vector to represent the specific mean of interest. In this vector we assign a `1`

to the mean of the group of interest and a `0`

to the other groups.

For example, to pull out the mean of `A.2`

from `emm1`

we will make a vector with 5 values in it, one for each row of the output. The second value will be a `1`

, since the mean of `A.2`

is on the second row of `emm1`

. All the other values in the vector will be `0`

.

Below is the vector that represents the `A.2`

mean.

A2 = c(0, 1, 0, 0, 0)

Similarly, to pull out the mean of `B.2`

we’ll have a vector of 5 values with a `1`

as the fourth value. The `B.2`

group is on the fourth row in `emm1`

.

B2 = c(0, 0, 0, 1, 0)

When building custom contrasts via vectors like this, the vectors will always be the same length as the number of rows in the `emmeans()`

output. I always calculate and print the estimated marginal means prior to building the vectors so I am certain of the number of rows and the order of the groups in the output.

# The contrast() function for custom comparisons

Once we have the vectors that represent the means we are interested in comparing, we actually do the comparisons via the `contrast()`

function. Since we are interested in a *difference* in mean response, we take the difference between the vectors that represent the means.

Taking the difference between vectors can be done inside or outside `contrast()`

. In this example I’m doing it inside.

The `contrast()`

function takes an `emmGrid`

object (i.e., output from `emmeans()`

) as the first argument. We give the comparison we want to do via a list passed to the `method`

argument.

Here I want to calculate the difference in mean `resp`

of `A.2`

and `B.2`

. I subtract the `B2`

vector from the `A2`

vector. The output is the difference in mean `resp`

between these groups.

contrast(emm1, method = list(A2 - B2) ) # contrast estimate SE df t.ratio p.value # c(0, 1, 0, -1, 0) -1.96 0.66 20 -2.972 0.0075

# Using named lists for better output

Unfortunately you can’t tell what comparisons was done in the output above ????. We can use a named list in `method`

to make the output more understandable.

contrast(emm1, method = list("A2 - B2" = A2 - B2) ) # contrast estimate SE df t.ratio p.value # A2 - B2 -1.96 0.66 20 -2.972 0.0075

# Using “at” for simple comparisons

Note that I didn’t need to do a custom contrast to do this particular comparison. I could have gotten the comparison I wanted by using the `at`

argument with `pairwise`

in `emmeans()`

and choosing just the two groups I was interested in.

emmeans(fit1, specs = pairwise ~ sub.rate, at = list(sub.rate = c("A.2", "B.2") ) ) # $emmeans # sub.rate emmean SE df lower.CL upper.CL # A.2 4.04 0.466 20 3.07 5.01 # B.2 6.00 0.466 20 5.03 6.97 # # Confidence level used: 0.95 # # $contrasts # contrast estimate SE df t.ratio p.value # A.2 - B.2 -1.96 0.66 20 -2.972 0.0075

# Multiple custom contrasts at once

Multiple custom contrasts can be done simultaneously in `contrast()`

by adding more comparisons to the `method`

list. I’ll demonstrate this by doing the simple example comparison twice, changing only which group mean is subtracted from the other.

I name both elements of the list for ease of interpretation. I find naming the list of comparisons to be a key part of doing these custom contrasts.

contrast(emm1, method = list("A2 - B2" = A2 - B2, "B2 - A2" = B2 - A2) ) # contrast estimate SE df t.ratio p.value # A2 - B2 -1.96 0.66 20 -2.972 0.0075 # B2 - A2 1.96 0.66 20 2.972 0.0075

Doing multiple comparisons at once means a multiple comparisons adjustment can be done as needed. In addition, we can use the `confint()`

function do get confidence intervals for the comparisons.

I’ll add a multivariate-\(t\) adjustment via `adjust = "mvt"`

and then get confidence intervals for the comparisons. Remember we can get both confidence intervals and tests for comparisons via `summary()`

with `infer = TRUE`

.

twocomp = contrast(emm1, method = list("A2 minus B2" = A2 - B2, "B2 minus A2" = B2 - A2), adjust = "mvt") %>% confint() twocomp # contrast estimate SE df lower.CL upper.CL # A2 minus B2 -1.96 0.66 20 -3.336 -0.584 # B2 minus A2 1.96 0.66 20 0.584 3.336 # # Confidence level used: 0.95 # Conf-level adjustment: mvt method for 2 estimates

# More complicated custom contrasts

Now that we’ve seen a simple case, let’s do something slightly more complicated (and realistic). What if we want to compare the `A`

group to the `B`

group overall, regardless of the application rate?

This is a *main effect* comparison, so I need to average over the effect of the rate factor in order to estimate the overall effect of the levels of the substance factor.

To do this comparison I need the means for all four non-control factor levels. I’ll print `emm1`

again here so I remember the order of the output before starting to write out the vectors that represent the group means.

emm1 # sub.rate emmean SE df lower.CL upper.CL # A.1 5.24 0.466 20 4.27 6.21 # A.2 4.04 0.466 20 3.07 5.01 # B.1 6.22 0.466 20 5.25 7.19 # B.2 6.00 0.466 20 5.03 6.97 # control 6.70 0.466 20 5.73 7.67 # # Confidence level used: 0.95

I’ll need all means that involve `A`

or `B`

, which is the first four group means in `emm1`

. I’ll make a vector to represent each of these group means.

While typing these vectors out isn’t too hard, since they only contain 5 values, when I have many groups and so really long vectors I sometimes use `rep()`

to repeat all the 0 values.

A1 = c(1, 0, 0, 0, 0) A2 = c(0, 1, 0, 0, 0) B1 = c(0, 0, 1, 0, 0) B2 = c(0, 0, 0, 1, 0)

The vectors I made represent means for combinations of substance and rate. I want to compare the *overall* substance group means, though. This can be done by *averaging over* the two rates. This involves literally taking the average of, e.g., `A1`

and `A2`

vectors to get a vector that represents the overall `A`

mean.

Aoverall = (A1 + A2)/2 Boverall = (B1 + B2)/2

Now that we have vectors to represent the overall means we can do comparison of mean `resp`

of the `A`

group vs `B`

group overall in `contrast()`

.

contrast(emm1, method = list("A - B" = Aoverall - Boverall) ) # contrast estimate SE df t.ratio p.value # A - B -1.47 0.466 20 -3.152 0.0050

Custom contrasts are all built in this same basic way. You can also build your own contrast function if there is some contrast you do all the time that is not part of **emmeans**. See the custom contrasts section of the **emmeans** vignette for more info.

# Just the code, please

Here’s the code without all the discussion.

library(emmeans) # v. 1.3.3 library(magrittr) # v. 1.5 dat = structure(list(sub.rate = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L), .Label = c("A.1", "A.2", "B.1", "B.2", "control"), class = "factor"), resp = c(5.5, 4.9, 6.1, 3.6, 6.1, 3.5, 3, 4.1, 5, 4.6, 7.3, 5.6, 4.8, 7.2, 6.2, 4.3, 6.6, 6.5, 5.5, 7.1, 5.4, 6.7, 6.8, 8.5, 6.1)), row.names = c(NA, -25L), class = "data.frame") str(dat) fit1 = lm(resp ~ sub.rate, data = dat) emmeans(fit1, specs = trt.vs.ctrlk ~ sub.rate) emm1 = emmeans(fit1, specs = ~ sub.rate) emm1 A2 = c(0, 1, 0, 0, 0) B2 = c(0, 0, 0, 1, 0) contrast(emm1, method = list(A2 - B2) ) contrast(emm1, method = list("A2 - B2" = A2 - B2) ) emmeans(fit1, specs = pairwise ~ sub.rate, at = list(sub.rate = c("A.2", "B.2") ) ) contrast(emm1, method = list("A2 - B2" = A2 - B2, "B2 - A2" = B2 - A2) ) twocomp = contrast(emm1, method = list("A2 minus B2" = A2 - B2, "B2 minus A2" = B2 - A2), adjust = "mvt") %>% confint() twocomp emm1 A1 = c(1, 0, 0, 0, 0) A2 = c(0, 1, 0, 0, 0) B1 = c(0, 0, 1, 0, 0) B2 = c(0, 0, 0, 1, 0) Aoverall = (A1 + A2)/2 Boverall = (B1 + B2)/2 contrast(emm1, method = list("A - B" = Aoverall - Boverall) )

**leave a comment**for the author, please follow the link and comment on their blog:

**Very statisticious on Very statisticious**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.