New version of WEC: focus on interactions

January 17, 2017
By

(This article was first published on Rense Nieuwenhuis » R-Project, and kindly contributed to R-bloggers)

We have uploaded a new version of WEC, an R package to apply ‘weighted effect coding’ to your dummy variables. With weighted effect coding, your dummy variables represent the deviation of their respective category from the sample mean, rather than the deviation from a reference category. Particularly with observational data, which are often unbalanced, this can have attractive interpretations. We recently published two articles in which we discuss some of the advantages:

Grotenhuis, M., Ben Pelzer, Eisinga, R., Nieuwenhuis, R., Schmidt-Catran, A., & Konig, R. (2016b). When size matters: advantages of weighted effect coding in observational studies. International Journal of Public Health, 1–5. http://doi.org/10.1007/s00038-016-0901-1

Grotenhuis, M., Ben Pelzer, Eisinga, R., Nieuwenhuis, R., Schmidt-Catran, A., & Konig, R. (2016a). A novel method for modelling interaction between categorical variables. International Journal of Public Health, 1–5. http://doi.org/10.1007/s00038-016-0902-0

As some of the real advantages of weighted effect coding come into play when using interactions, that was what we focused in the current update to our ‘wec’ package (version 0.4). The package now supports interactions between a weighted effect coded factor variable and an interval variable, and the calculation of interactions between two weighted effect coded factor variables was much improved. An example is given below (with more to follow, hopefully soon).


library(wec)
data(PUMS)
PUMS$race.wec <- factor(PUMS$race)
contrasts(PUMS$race.wec) <- contr.wec(PUMS$race.wec, "White")
PUMS$race.educint <- wec.interact(PUMS$race.wec, PUMS$education.int)
m.wec.educ <- lm(wage ~ race.wec + education.int + race.educint, data=PUMS)
summary(m.wec.educ)$coefficients

The code above results in a regression model (shown below) in which the main effect for education (9048) remains the same, whether the interaction terms are included or not (you can try this yourself). Thus, the interaction terms represent how much the average education effect varies by race.

                            Estimate Std. Error t value Pr(>|t|)
(Intercept)                     52320        559    93.5  0.0e+00
race.wecHispanic                -4955       1736    -2.9  4.3e-03
race.wecBlack                  -11276       1817    -6.2  5.7e-10
race.wecAsian                    5151       2381     2.2  3.1e-02
education.int                    9048        287    31.6 2.3e-208
race.educintinteractHispanic    -3266        977    -3.3  8.3e-04
race.educintinteractBlack       -3293        990    -3.3  8.8e-04
race.educintinteractAsian        3575       1217     2.9  3.3e-03

To leave a comment for the author, please follow the link and comment on their blog: Rense Nieuwenhuis » R-Project.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)