Sexy, Geeky Graphs using ggplot2 in R

April 22, 2011
By

(This article was first published on Psychwire » R, and kindly contributed to R-bloggers)

So I’ve been looking for some data to play with while learning R, other than the data I’m analysing for various experiments and papers I’m working on. I thought to myself, “Hey, this R stuff is pretty geeky. Can I engage in a higher level of geekiness?” And I think I’ve found a way: using R to analyse player performance in a computer game.

Background: The Data

The game in question is the epic cash cow known as World of Warcraft (otherwise known as WoW to some, or pronounced “Woo” by hilarious people), made by dear old Blizzard Entertainment. I’ve been a long-time player of Blizzard games, starting with a demo of Warcraft 2 that came on a CD with a magazine (hey, CDs, remember when games came on CDs?). Since then it’s been the works… Warcraft 3, Starcraft (1 and a tiny bit of 2), Diablo 2 (for far too long). I also have in my house a copy of Lost Vikings on the SNES (my other half’s, she’s as bad with this stuff as I am, though it does mean we have two SNES machines). Sadly, I don’t get time to play games these days – though I did used to raid a lot when I was an undergrad, I don’t really have time now.

Fighting the good fight: taking on a Pome Wraith - a zombie with an apple on its head

Anyway, in plain English – for those of you who haven’t heard of this game before now- the point of a large part of the game is to take your character that you have control of and go and bash large, unpleasant creatures on the noggin. After a while, those creatures die and leave you with shiny prizes and loot. It might sound a bit simplistic, but actually it gets quite complex: there are a large number of decisions you need to make in order to maximise your performance, you need to be very fast to react to changing circumstances in the environment, and you need to work with a set of other people in parallel to get the job done. For an example of stuff people need to learn in order to do a decent job, take a look here.

All of this (and the fact that there is an enormous players numbering many millions across the globe) has meant that there has been a drive to get the most out of what players can do. There’s a sizeable community of players who run various models and simulations to work out the best ways to do things. This has made me often wonder if the player performance could also benefit from being analysed in a post-hoc manner. Rather than using models and simulations, why not take actual player performance and see how people fare?

Well, there are problems with that: not everyone is very good at the game. Plus, that would involve a lot of data collection (which I assume Blizzard do in some shape or form, by the way, from comments they have made at various times). So, let’s go for a different approach. Let’s pick the best players and see how they manage. These best players will serve as an approximation to the ideal maximum of what can be achieved. Now, here’s where you may be thinking “hrmmmm”, but please, stick with it. This is more meant to be an entertaining illustration to what various functions in R can do, rather than a set of data being analysed that I intend to stand by and be certain can be trusted. It’s all a bit of fun.

Fortunately, there’s an easy way to get the best scores that players have achieved: World of Logs has a ranking system for the best scores on various fights in the game. So, I went there, found an encounter, and started copying and pasting the ranks into a spreadsheet. I picked the top 40 scores for Nefarian. He’s a big dragon who was killed in a previous version of the game, but is back now with a headache or something. Actually, I remember him toasting me a few times (I was a rogue back then, and our tank didn’t understand the whole ‘rotate the giant puppy’ part of the rogue class call).

Getting into ggplot2

Now that I have my data set up, I’m going to do some basic graphs using ggplot2. Now, if you’re like me and have seen some examples of what ggplot2 can do, you might have thought “oh my, that looks sexy!”. And then you tried to work out how to make nice-looking graphs and became somewhat unstuck. Trust me, though, it’s worth persevering with, because ggplot’s power comes from its flexibility. I used to make my graphs using Sigmaplot, but now I have a graph format that I like, it’s a case of copy and pasting things around to get very nice graphs instantly.

I initially started trying to use the qplot() funciton, but, as I understand, it is limited in various ways compared to what the mighty ggplot() function can do. So let’s stick with ggplot(), or else you’ll have to learn how to do things twice, and that’s no fun at all.

The basic way that ggplot() works is very similar to a number of other programming languages when it comes to putting together images (e.g., pygame images and image creation in PHP – I’m sure it’s similar to others too, but those are all I’ve used). Essentially, you stack a set of options and commands on top of a blank canvas. So you start with nothing, then you say, “right, let’s make a plot”, then you start building things into it. You want points drawn? Stack them on the canvas. You want error bars? Stack them on, too. If you don’t tell it what to do, it will, in some cases, make assumptions about what you want, and go with the defaults. For some functions and programs, the defaults are horrible. This is not the case with ggplot: the defaults are awesome.

So here I’m just going to do something very simple to illustrate how you can build up options and commands to make a set of graphs. I’m basing this on an example from the ggplot documentation. I’m going to make a series of density plots of player Damage Per Second (DPS, the standard indicator of performance, and the more the better!) and compare the various specialisations (specs) which are sub-components of the various classes in the game. Depending on what you want to do, you might choose one spec over another. Similarly, depending on what you want to do, you might pick one specific class. Say you want to turn into a bear: you’d be a druid. If you want to be skirt-wearing magician: you’d be a mage. And so on! Anyway, on with the code:

?View Code RSPLUS
 ggplot(full_list, aes(DPS, fill = spec)) + facet_wrap(facet=~class)+ geom_density(alpha = 0.2) + scale_x_continuous("Damage Per Second (DPS)")+ opts(axis.text.x = theme_text(angle = 90, hjust = 0, size=7))

Note the “+” symbols at the end of each line. The + is used to add additional options to the ggplot command, but, if you are running them from a script, you’ll need to ensure that, if you have multiple +options on multiple lines, you need to add the + symbol at the end of a line, not the start of a line, or it won’t run. That took me a while to work out!

Anyway, the first line tells ggplot to use the dataset I have called full_list. The next command aes, starts outlining aesthetic mappings for the plot to use. Here I define my x-axis by entering DPS. Next I tell it to colour the different plots by spec by using the fill command.

Next comes facet_wrap which splits up the graphs like the lattice function by the class factor. This will produce one graph for each class.

The third line adds a geom_density or density plot element. The transparency (alpha) is set to 0.2 to enable you to see how the density plots overlap.

The fourth line sets the x-axis title using the scale_x_continuous command. Note that if your x-axis is a factor you need to use scale_x_discrete instead.

Finally we have the opts or options. There are a huge number of options, the best list of which I’ve found is here. Here I’ve set the x-axis text to be angled and therefore easier to read without overlapping.

Now, let’s take a look at the output:

The full set of plots. There may be too many specs and colours here!

You can see that some specs of different classes do better than others. Some aren’t supposed to do much damage, as they have other roles (e.g., the ones with “prot” in the name). Again, please don’t take this as a serious attempt at comparing the specs and classes, it’s just some data to play around with and explore for illustrative purposes.

The next steps will be to try out various ways of summarising the data (e.g. data.table, aggregate, plyr), after which I’ll start running some statistical tests.