By far the easiest way to detect and interpret the interaction between two-factor variables is by drawing an interaction plot in R. It displays the fitted values of the response variable on the Y-axis and the values of the first factor on the X-axis. The second factor is represented through lines on the chart – each possible value of the second factor gets its own line.
Today you’ll learn when you should consider plotting interaction charts and how you can do it in R.
Nervous about your upcoming data science interview? Make sure you can answer these 7 technical questions.
Table of contents:
- Dataset Preparation for Interaction Plots
- Quantify Relationships Using ANOVA
- Visualizing Interaction Plot in R
Dataset Preparation for Interaction Plots
The first order of business is to acquire a dataset. The best ones a typically those produced by statistical research. For example, this diet dataset contains information on people who undertook one of three diets – everything from their age, gender, height, diet regime, and weight before and after the six-week period.
Download the dataset in a CSV format and use the following code snippet to load it into R:
Here’s what the first six rows look like:
Now, we don’t need everything from the dataset. All we care about is the relationship of weight loss in consideration with two factors – gender and diet regime.
The code snippet below transforms the dataset so that missing values are removed, weight loss is calculated, variables are converted to factors, and only columns of interest are kept:
We have a much leaner dataset now:
That’s all we need to visualize the interaction plot in R. But before we do so, we need to know is there a reason to even consider the interaction plot. That’s a question ANOVA test can answer.
Quantify Relationships Using ANOVA
We can do a two-way ANOVA test to find out if two factors affect the response variable. We’ve covered ANOVA on the Appsilon blog, and shown how to implement it from scratch. Consider these resources if you want to learn more:
- ANOVA in R – How to Implement One-Way ANOVA From Scratch
- MANOVA in R – Multivariate ANOVA explained in R
Back to the topic. Our two-way ANOVA model should explain the effect of diet and gender factors on the response variable – weight loss. Use the following code snippet to fit the model and print its summary:
Here’s the model summary:
The only thing we’re looking at here is the P-value of
diet.type:gender. It’s below the significance level (0.05), which indicates there’s a significant interaction effect between the factors.
Knowing this, the next logical step is to visualize the interaction plot.
Visualizing Interaction Plot in R
R has the
interaction.plot() function built-in, but it comes with a ton of parameters beginners can find confusing. The following list explains all the parameters you need to create an interaction plot:
x.factor– A factor variable whose levels will be on the X-axis.
trace.factor– The second-factor variable whose levels will be represented as traces (lines).
response– A numeric response variable.
fun– The function to compute the summary, e.g. median.
ylab– Y-axis label of the plot.
xlab– X-axis label of the plot.
trace.label– Label for the legend.
col– A vector of colors used for all traces.
lyt– A type of the lines drawn.
lwd– Width of the lines drawn.
It’s a lot, but none of it should feel difficult to understand. Let’s now use the function to draw the interaction plot:
Here’s what it looks like:
If the lines on the interaction plot are parallel, then there’s no interaction between the factors. If the lines intersect, then there’s likely an interaction between them.
We can see that our lines are intersecting, which means there’s an interaction between diet type, gender, and weight loss. The results are expected, as the P-value from the ANOVA test told us there’s a significant interaction effect between them.
Summing Up Interaction Plots in R
Today you’ve learned what an interaction plot in R is and how it can help you. It’s an excellent supplement to ANOVA tests and allows you to replace tables of numbers with easily interpretable data visualization.
For a homework assignment, we recommend you apply the same code to your dataset or a dataset based on another statistical research. Share your results with us on Twitter – @appsilon. We’d love to see what you come up with.
Want to dive further into visualizing line charts? Follow our complete guide to stunning visuals with ggplot2.