In this post, I go over the basics of running an ANOVA using R. The dataset I’ll be examining comes from this website, and I’ve discussed it previously (starting here and then here). I’ve not seen many examples where someone runs through the whole process, including ANOVA, post-hocs and graphs, so here we go.
This is also my first post over at wordpress.com. My old site will still exist with my old posts, it was just starting to churn a bit with so many visitors. It started to mess up images to, so I have decided to move!
So here I’m going to ask the question: which class scored highest on DPS? I won’t be breaking this down by spec yet, that will come in future posts. The syntax for running an ANOVA is this:
aov(DV ~ IV, data=dataset)
It’s so simple! If you want to look at an interaction, or have more than one factor, go for:
aov(DV ~ IV1 * IV2, data=dataset)
Running the ANOVA
Let’s go for it:
btwn <- aov(DPS ~ class, data=full_list_dps) summary(btwn)
This shows us that there is a highly significant effect of class on the dps score. But how should we visualise it?
Visualising the Results
I’ve seen various guides offer different approaches to visualising ANOVA results. I researched quite a few before settling on ggplot2 to do this. I’ll begin by summarising the data:
graph_summary<-ddply(full_list_dps, c("class"), summarize, AVERAGE=mean(DPS), SE=sqrt(var(DPS)/length(DPS)))
This uses plyr which you can see details of in some previous posts I’ve made (e.g, here). For the graph that I will make, I’d like to have class ordered by dps. To do that, I run this code:
graph_summary$class <- reorder(graph_summary$class, graph_summary$AVERAGE)
It reorders the class column by the AVERAGE of each class.
Next up, we have the code for the graph:
ggplot(graph_summary)+ aes(x=class, y=AVERAGE, colour=class)+ geom_point()+ geom_errorbar(aes(ymax=AVERAGE+SE, ymin=AVERAGE-SE))+ opts(axis.text.x = theme_text(angle = 90, hjust = 0, size=11), axis.title.x = theme_text(size=14), axis.title.y = theme_text(angle = 90, size=14))+ scale_x_discrete("Class")+ scale_y_continuous("DPS (Mean)")
This then gives us the following:
Next up are the post-hoc tests to run. You can run Tukey’s HSD tests with the simple command:
However, if you’re more inclined towards t-tests, there’s a great function that can do all of them for you. It’s this:
with(full_list_dps, pairwise.t.test(DPS, class, p.adj="bonferroni", paired=F))
The with command just tells R to use full_list_dps so we don’t have to write full_list_dps$DPS and so on when running the t-test. The syntax for the pairwise t-test follows the following formula:
pairwire.t.test(DV, IV, ADJUSTMENT, PAIRED??)
DV and IV I assume you understand. There are lots of p-value adjustment options, and here I’ve gone for bonferroni. As these are not paired data, I’ve said paired to FALSE by writing in paired=F. What pairwise.t.test does is run every combination of t-test for you based on your factor levels. Great! You’ll get results like the following, showing you p-values from t-tests comparing all the factor levels with each other:
I’ve converted the output into a table and rounded the values to a few decimal places, but I find tables in this format a very easy way to check for significant effects.
The method I have presented here for averaging data and plotting it with error bars in ggplot2 was taken/inspired from this post over at the fantastically-named i’m a chordata! urochordata! blog. I have no idea where the blog title comes from–but it sounds cool. I’m indebted to that author/site because, well, it was the first decent example I found that did what I was after!
I’d also like to reference this for helping me work out how to convert pairwise t-test results into a table like the one pictured above.
See also the personality project pages on ANOVAs for further reading.