In this post, I describe how to use the reshape package to modify a dataframe from a long data format, to a wide format, and then back to a long format again. It’ll be an epic journey; some of us may not survive (especially me!).
Wide versus Long Data Formats
I’ll begin by describing what is meant by ‘wide’ versus ‘long’ data formats. Long data look like this:
As you can see, there is one row for each value that you have. Many statistical tests in R need data in this shape (e.g., ANOVAs and the like). This is the case even when running tests with repeated factors.
In the example above, lets say that iv1 is a between-subjects factor and iv2 is a within-subjects factor. The same table, in a wide format, would look like this:
Here, each column represents a unique pairing of the various factors. SPSS favours this method for repeated-measures tests (such as repeated-measures ANOVAs or paired t-tests), and being able to move between the two formats is helpful when multiple people are working on a single dataset but using different packages (e.g., R vs SPSS).
Get in Shape! The Reshape Package
I’ll begin by going back to a dataset that I’ve been messing around with for some time. I’m going to select out the columns I need, and rename one of them. One of them ended up getting called “X.” because of the way the data were tabbed. Here, I rename the “X.” column into “rank”, which is what it really should have been in the first place.
full_list_cutdown = data.frame("rank"=full_list_dps$X., "class"=full_list_dps$class, "spec"=full_list_dps$spec, "dps"=full_list_dps$DPS)
The data look like this:
Let’s begin by converting these data into a wide format. To do that, all we need to do is use the cast function. This has the general format of:
cast(dataset, factor1 ~ factor2 ~ etc., value=value column, fun=aggregation method)
Here, dataset refers to your target dataset. factor1 ~ factor2 ~ etc lists the columns/factors that you want to split up the data by. value deals with the column that you want to select and calculate a value for. You can run all sorts of aggregation functions using the fun= command. The default is len, the count of the number of cells for that combination of factor levels. To make my dataset into a wide format, all I need to run is:
wide_frame = data.frame(cast(full_list_cutdown, rank~class, value=c('dps'), fun=mean))
Here, I create a wide dataframe based on the rank and class columns. The computed value is the mean of the dps column. It looks like this:
There and Back Again: Getting from Wide to Long Format
Say that we want to go back to the long format again (or, indeed, convert from wide to long in the first place!). How can we do that? We use the melt function!
This takes us right back to the start, where our exciting journey began.