Weighted Effect Coding: New publication in the R Journal

July 3, 2017
By

(This article was first published on Rense Nieuwenhuis » R-Project, and kindly contributed to R-bloggers)

Weighted effect coding is a technique for dummy coding that can have attractive properties, particularly when analysing observational data. In a new publication in the R Journal we explain the rationale of weighted effect coding, introduce the ‘wec’ package, and provide examples that include interactions.

The attractive property of applying weighted effect coding to categorical (‘factor’) variables is that each category represents the deviation of that category from the sample mean. This is unlike the more commonly used treatment coding where each a specific category has to be selected as a reference. Weighted effect coding is a generalized form of effect coding that applies to both balanced and unbalanced data.

A form of weighted effect coding was already formulated in 1972 by Sweeney and Ulveling, but it seems to never have found its place in statistical repertoires. Weighted effect coding was not implemented in mainstream statistical software. In an ongoing project, we have now further developed weighted effect coding to also apply to interactions (with both categorical and continuous variables), and provide procedures for mainstream statistical software. For R, we developed the ‘wec’ package, and procedures for STATA and SPSS are available as well.

A key innovation in our article in the R Journal is the formulation of interactions between a categorical variable with a continuous variable. This is visualised in the Figure above. The benefit of estimating such an interaction with weighted effect coding is that upon entering the interaction terms the estimate for the continous variable (as well as the ‘main effects’ for the categorical variable) does not change. The ‘main’ continous term reflects the average effect in the sample, and the interaction terms represent the deviation of the effect size for each category.

References

Grotenhuis, Te, M, Pelzer, B., Eisinga, R., Nieuwenhuis, R., Schmidt-Catran, A., & Konig, R. (2017b). A novel method for modelling interaction between categorical variables. International Journal of Public Health, 62(3), 427–431. (open access!)

Grotenhuis, Manfred, Ben Pelzer, Eisinga, R., Nieuwenhuis, R., Schmidt-Catran, A., & Konig, R. (2017a). When size matters: advantages of weighted effect coding in observational studies. International Journal of Public Health, (62), 163–167. (open access!)

Nieuwenhuis, R., Grotenhuis, Te, M., & Pelzer, B. (2017). Weighted Effect Coding for Observational Data with wec. R Journal, 9(1), 477–485. (open access!)

Sweeney, R. E., & Ulveling, E. F. (1972). A transformation for simplifying the interpretation of coefficients of binary variables in regression analysis. The American Statistician.

To leave a comment for the author, please follow the link and comment on their blog: Rense Nieuwenhuis » R-Project.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)