Testing The Equality of Regression Coefficients

[This article was first published on R on I Should Be Writing: COVID-19 Edition, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The Problem

You have two predictors in your model. One seems to have a stronger coefficient than the other. But is it significant?

Example: when predicting a worker’s salary, is the standardized coefficient of number of extra hours (xtra_hours) really larger than that of number of compliments given the to boss n_comps?

library(parameters)
library(effectsize)

data("hardlyworking", package = "effectsize")

hardlyworkingZ <- standardize(hardlyworking)

m <- lm(salary ~ xtra_hours + n_comps, data = hardlyworkingZ)

model_parameters(m)
#> Parameter   | Coefficient |   SE |        95% CI |    t(497) |      p
#> ---------------------------------------------------------------------
#> (Intercept) |   -7.19e-17 | 0.02 | [-0.03, 0.03] | -4.14e-15 | > .999
#> xtra_hours  |        0.81 | 0.02 | [ 0.78, 0.84] |     46.60 | < .001
#> n_comps     |        0.41 | 0.02 | [ 0.37, 0.44] |     23.51 | < .001

Here are 4 methods to test coefficient equality in R.

Notes
- If we were interested in the unstandardized coefficient, we would not need to first standardize the data.
- Note that if one parameter was positive, and the other was negative, one of the terms would need to be first reversed (-X) to make this work.

Method 1: As Model Comparisons

Based on this awesome tweet.

Since:

\(\hat{Y} = a \times \text{xtra_hours} + a \times \text{n_comps} \\ = a \times (\text{xtra_hours} + \text{n_comps})\)

We can essentially force a constraint on the coefficient to be equal by using a composite variable.

m0 <- lm(salary ~ I(xtra_hours + n_comps), data = hardlyworkingZ)

model_parameters(m0)
#> Parameter            | Coefficient |   SE |        95% CI |    t(498) |      p
#> ------------------------------------------------------------------------------
#> (Intercept)          |   -2.05e-17 | 0.02 | [-0.04, 0.04] | -9.57e-16 | > .999
#> xtra_hours + n_comps |        0.61 | 0.01 | [ 0.58, 0.64] |     41.09 | < .001

We can then compare how this model compares to our first model without this constraint:

anova(m0, m)
#> Analysis of Variance Table
#> 
#> Model 1: salary ~ I(xtra_hours + n_comps)
#> Model 2: salary ~ xtra_hours + n_comps
#>   Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
#> 1    498 113.67                                  
#> 2    497  74.95  1    38.716 256.73 < 2.2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We can conclude that the unconstrained model is significantly better than the constrained model - meaning that \(\beta_{\text{xtra_hours}} > \beta_{\text{n_comps}}\).

Method 2: Paternoster et al (1998)

According to Paternoster et al. (1998), we can compute a t-test to compare the coefficients:

bs <- coef(m)[-1]
V <- vcov(m)[-1, -1]

tibble::tibble(
  diff_estim = diff(bs),
  diff_SE = sqrt(V[1, 1] + V[2, 2] - 2 * V[1, 2]),
  t_stat = diff_estim / diff_SE,
  df = df.residual(m),
  p_value = 2 * pt(abs(t_stat), df = df, lower.tail = FALSE)
)
#> # A tibble: 1 x 5
#>   diff_estim diff_SE t_stat    df  p_value
#>                  
#> 1     -0.402  0.0251  -16.0   497 6.96e-47

This gives the exact same results as the first method! (\(t^2 = (-16)^2 = 256\) is the same as the F-value from the anova() test.)

Method 3: emmeans <3

We can estimate the slopes from the model using emmeans, and then, with some trickery, compare them!

library(emmeans)

trends <- rbind(
  emtrends(m, ~1, "xtra_hours"),
  emtrends(m, ~1, "n_comps")
)

# clean up so it does not error later
[email protected]$`1` <- c("xtra_hours", "n_comps")
[email protected]$estName <- "trend"

trends
#>  1          trend     SE  df lower.CL upper.CL
#>  xtra_hours 0.811 0.0174 497    0.772    0.850
#>  n_comps    0.409 0.0174 497    0.370    0.448
#> 
#> Confidence level used: 0.95 
#> Conf-level adjustment: bonferroni method for 2 estimates

Compare them:

pairs(trends)
#>  contrast             estimate     SE  df t.ratio p.value
#>  xtra_hours - n_comps    0.402 0.0251 497 16.023  <.0001

Once again - exact same results!

Method 4: lavaan

The lavaan package for latent variable analysis and structural equation modeling allows for parameter constraining. So let’s do just that:

library(lavaan)

m0 <- sem("salary ~ a * xtra_hours + a * n_comps", data = hardlyworkingZ)
m <- sem("salary ~ xtra_hours + n_comps", data = hardlyworkingZ)

anova(m0, m)
#> Chi-Squared Difference Test
#> 
#>    Df    AIC    BIC  Chisq Chisq diff Df diff Pr(>Chisq)    
#> m   0 476.04 488.69   0.00                                  
#> m0  1 682.26 690.69 208.22     208.22       1  < 2.2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

This too yields similar results! (Only slightly different due to different estimation methods.)

Summary

That’s it really - these 4 simple methods have wide applications to GL(M)M’s, SEM, and more.

Enjoy!

To leave a comment for the author, please follow the link and comment on their blog: R on I Should Be Writing: COVID-19 Edition.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)