R Tutorial Series: Regression With Interaction Variables

January 23, 2010
By

(This article was first published on R Tutorial Series, and kindly contributed to R-bloggers)

Interaction variables introduce an additional level of regression analysis by allowing researchers to explore the synergistic effects of combined predictors. This tutorial will explore how interaction models can be created in R.

Tutorial Files

Before we begin, you may want to download the sample data (.csv) used in this tutorial. Be sure to right-click and save the file to your R working directory. This dataset contains variables for the following information related to ice cream consumption.

  • DATE: Time period (1-30)
  • CONSUME: Ice cream consumption in pints per capita
  • PRICE: Per pint price of ice cream in dollars
  • INC: Weekly family income in dollars
  • TEMP: Mean temperature in degrees F

Note that all code samples in this tutorial assume that this data has already been read into an R variable and has been attached.

Planning The Model

Suppose that our research question is "how much of the variance in ice cream consumption can be predicted by per pint price, weekly family income, mean temperature, and the interaction between per pint price and weekly family income?" The italicized interaction term is the new addition to our typical multiple regression modeling procedure. This variable is relatively simple to incorporate, but it does require a few preparations.

Creating The Interaction Variable

A two step process can be followed to create an interaction variable in R. First, the input variables must be centered to mitigate multicollinearity. Second, these variables must be multiplied to create the interaction variable.

Step 1: Centering

To center a variable, simply subtract its mean from each data point and save the result into a new R variable, as demonstrated below.

  1. > #center the input variables
  2. > PRICEc <- PRICE - mean(PRICE)
  3. > INCc <- INC - mean(INC)

Step 2: Multiplication

Once the input variables have been centered, the interaction term can be created. Since an interaction is formed by the product of two or more predictors, we can simply multiply our centered terms from step one and save the result into a new R variable, as demonstrated below.

  1. > #create the interaction variable
  2. > PRICEINCi <- PRICEc * INCc

Creating The Model

Now we have all of the pieces necessary to assemble our complete interaction model.

  1. > #create the interaction model using lm(FORMULA, DATAVAR)
  2. > #predict ice cream consumption by its per pint price, weekly family income, mean temperature, and the interaction between per pint price and weekly family income
  3. > interactionModel <- lm(CONSUME ~ PRICE + INC + TEMP + PRICEINCi, datavar)
  4. > #display summary information about the model
  5. > summary(interactionModel)

A summary of our interaction model is displayed below.

At this point we have a complete interaction model. Naturally, if this were a full research analysis, we would likely compare this model to others and assess the value of each predictor. For information on comparing models, see the tutorial on hierarchical linear modeling.

Complete Interaction Model Example

To see a complete example of how an interaction model can be created in R, please download the interaction model example (.txt) file.

References

Kadiyala, K. (1970). Ice Cream [Data File]. Retrieved December 14, 2009 from http://lib.stat.cmu.edu/DASL/Datafiles/IceCream.html

To leave a comment for the author, please follow the link and comment on his blog: R Tutorial Series.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , , , , ,

Comments are closed.