Categorical variables in regression models are often included by dummy variables. In R, this is done with factor variables with treatment coding. Typically, the difference and significance of each category are tested against a preselected reference category. We present a useful alternative.
If all categories have (roughly) the same number of observations, you can also test all categories against the grand mean using effect (ANOVA) coding. In observational studies, however, the number of observations per category typically varies. Our new paper shows how categories of a factor variable can be tested against the sample mean. Although the paper has been online for some time now (and this post is an update to an earlier post some time age), we are happy to announce that our paper has now officially been published a the International Journal of Public Health.
To apply the procedures introduced in these papers, called weighted effect coding, procedures are made available for R, SPSS, and Stata. For R, we created the ‘wec’ package which can be installed by typing:
Grotenhuis, M., Ben Pelzer, Eisinga, R., Nieuwenhuis, R., Schmidt-Catran, A., & Konig, R. (2017). When size matters: advantages of weighted effect coding in observational studies. International Journal of Public Health, (62), 163–167. http://doi.org/10.1007/s00038-016-0901-1
Sweeney R, Ulveling EF (1972) A transformation for simplifying the interpretation of coefficients of binary variables in regression analysis. Am Stat 26:30–32