We have uploaded a new version of WEC, an R package to apply ‘weighted effect coding’ to your dummy variables. With weighted effect coding, your dummy variables represent the deviation of their respective category from the sample mean, rather than the deviation from a reference category. Particularly with observational data, which are often unbalanced, this can have attractive interpretations. We recently published two articles in which we discuss some of the advantages:

Grotenhuis, M., Ben Pelzer, Eisinga, R., Nieuwenhuis, R., Schmidt-Catran, A., & Konig, R. (2016b). When size matters: advantages of weighted effect coding in observational studies. *International Journal of Public Health*, 1–5. http://doi.org/10.1007/s00038-016-0901-1

Grotenhuis, M., Ben Pelzer, Eisinga, R., Nieuwenhuis, R., Schmidt-Catran, A., & Konig, R. (2016a). A novel method for modelling interaction between categorical variables. *International Journal of Public Health*, 1–5. http://doi.org/10.1007/s00038-016-0902-0

As some of the real advantages of weighted effect coding come into play when using interactions, that was what we focused in the current update to our ‘wec’ package (version 0.4). The package now supports interactions between a weighted effect coded factor variable and an interval variable, and the calculation of interactions between two weighted effect coded factor variables was much improved. An example is given below (with more to follow, hopefully soon).

library(wec)

data(PUMS)

PUMS$race.wec <- factor(PUMS$race)

contrasts(PUMS$race.wec) <- contr.wec(PUMS$race.wec, "White")

PUMS$race.educint <- wec.interact(PUMS$race.wec, PUMS$education.int)

m.wec.educ <- lm(wage ~ race.wec + education.int + race.educint, data=PUMS)

summary(m.wec.educ)$coefficients

The code above results in a regression model (shown below) in which the main effect for education (9048) remains the same, whether the interaction terms are included or not (you can try this yourself). Thus, the interaction terms represent how much the average education effect varies by race.

Estimate Std. Error t value Pr(>|t|)
(Intercept) 52320 559 93.5 0.0e+00
race.wecHispanic -4955 1736 -2.9 4.3e-03
race.wecBlack -11276 1817 -6.2 5.7e-10
race.wecAsian 5151 2381 2.2 3.1e-02
education.int 9048 287 31.6 2.3e-208
race.educintinteractHispanic -3266 977 -3.3 8.3e-04
race.educintinteractBlack -3293 990 -3.3 8.8e-04
race.educintinteractAsian 3575 1217 2.9 3.3e-03

*Related*

R-bloggers.com offers

**daily e-mail updates** about

R news and

tutorials on topics such as:

Data science,

Big Data, R jobs, visualization (

ggplot2,

Boxplots,

maps,

animation), programming (

RStudio,

Sweave,

LaTeX,

SQL,

Eclipse,

git,

hadoop,

Web Scraping) statistics (

regression,

PCA,

time series,

trading) and more...

If you got this far, why not

__subscribe for updates__ from the site? Choose your flavor:

e-mail,

twitter,

RSS, or

facebook...