# Comparing Student outcomes with Research Output (using R and ggplot2′s text labels)

May 22, 2011
By

(This article was first published on Psychwire » R, and kindly contributed to R-bloggers)

In this post, I take a look at some league table data recently published by the Guardian. I also provide the R code for annotating the graphs for ggplot2.

## Purpose

It’s one of those fun aspects of teaching at a university that teaching itself isn’t the most important things on our minds. Students often complain that ‘staff are too busy with their research to care about teaching’. Is that true? Do those who’d rather run experiments and write papers care so little about students that they ditch the students and focus on their own needs instead?

"Please, Fry! I don't know how to teach. I'm a professor!"

There is a simple way to gain insight into whether this may be true. If academic staff care so much about research, and so little about what the students get up to, then logically, the universities that have the highest research output should also rank the lowest in how the students fare during their degrees. This is a simple correlation that I’ll now illustrate: higher scores for students should be correlated with lower scores for research output.

## The Data

These data were obtained from the Guardians’ University Guide: Psychology (psychology is what I’m interested in as that’s what I teach/research) and their most recent research ratings.  First, the graph (click for a larger version, it’s big!):

Running a correlation on this gives a correlation of 0.69, which is significant (p<.0001). I haven’t tried to fit a line to the graph because I think there is enough on there already! Not only does this correlation go in the opposite direction that would have been expected, it’s a pretty strong and significant correlation, too. Higher scores for research output were correlated with higher scores for the students and their outcomes.

## Conclusion

It therefore looks like the claim that ’staff are too busy with their research to care about teaching’ isn’t necessarily true. Granted, this is a correlation rather than any attempt to get insights into the direct cause and effect going on here, but I think it’s still interesting to explore this. I intend to point students to this post next time they complain about something like this! I’ll leave it to the reader to think about why the correlation might be going in this direction.

Quick note: I don’t want to claim credit for thinking about doing these kind of analyses; I’ve heard this correlation that has been reported here discussed previously, but never actually seen it for real. That was part of the reason that I decided to take a look into it!

Quick note #2: there are other possible student metrics in the Guardian data that could be compared with research output. These may be worth exploring too, but I’ve focused here on the overall measure for students as that’s what is used to rank the league tables.

## R Code

Here’s the R code for the graph and corelation:

unis<-ggplot(uni_data)+
aes(x=research, y=student_score,label=name)+
geom_text(size=3)+
scale_x_continuous("Research Score")+
scale_y_continuous("Student Score")

cor.test(uni_data$research, uni_data$student_score)

ggsave(unis, file="unis.png")


To draw the text onto the graph, it’s a simple case of calling geom_text which draws the university names specified in the name column. This is set using the label aesthetic (aes). It’s surprising how easy it is to get graphs of this type together; though I do think this graph is a bit messy, simply because of the large number of names involved, many of which are quite long.