[This article was first published on Software for Exploratory Data Analysis and Statistical Modelling » R Environment, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

There are many situations where data is presented in a format that is not ready to dive straight to exploratory data analysis or to use a desired statistical method. The reshape2 package for R provides useful functionality to avoid having to hack data around in a spreadsheet prior to import into R.

The melt function takes data in wide format and stacks a set of columns into a single column of data. To make use of the function we need to specify a data frame, the id variables (which will be left at their settings) and the measured variables (columns of data) to be stacked. The default assumption on measured variables is that it is all columns that are not specified as id variables.

Consider the following set of data:

 > dat FactorA FactorB Group1 Group2 Group3 Group4 1 Low Low -1.1616334 -0.5228371 -0.6587093 0.45064563 2 Medium Low -0.5991478 -1.0461138 -0.1942979 2.47985577 3 High Low 0.8420797 -1.5413266 0.6318852 -0.98948125 4 Low Medium 1.6225569 -1.2706469 -0.8026467 -0.32332181 5 Medium Medium -0.3450745 -1.3377985 1.4988363 0.36541918 6 High Medium 1.6025044 0.7631882 -0.5375833 0.85028148 7 Low High -1.2991011 -0.2223622 -0.6321478 -1.57284216 8 Medium High -0.4906400 -1.1802192 0.1235253 0.09891793 9 High High 0.3897769 -0.3832142 0.6671101 0.23407257

There four groups are to used as part of a statistical analysis so we want to stack them into a single column and create an factor variable to indicate which group the measurement corresponds to and the melt function does the trick:

 > melt(dat) Using FactorA, FactorB as id variables FactorA FactorB variable value 1 Low Low Group1 -1.16163338 2 Medium Low Group1 -0.59914783 3 High Low Group1 0.84207974 4 Low Medium Group1 1.62255690 5 Medium Medium Group1 -0.34507455 6 High Medium Group1 1.60250438   ... 36 High High Group4 0.23407257

Consider a second set of data where there are two groups but we only want to retain the FactorB variable in the molten data set:

 FactorA FactorB Group1 Group2 1 Low Very Low 6.851828 3.061329 2 Medium Very Low 7.352169 1.303077 3 High Very Low 6.918091 2.477875 4 Low Low 7.402351 2.450527 5 Medium Low 6.928385 4.334323 6 High Low 7.400626 3.074158 7 Low Medium 8.312145 5.725185 8 Medium Medium 8.251806 4.384492 9 High Medium 8.339398 3.443789 10 Low High 5.127386 2.868952 11 Medium High 8.561181 3.616898 12 High High 6.993838 3.450634 13 Low Very High 7.880877 2.950622 14 Medium Very High 9.439892 3.220295 15 High Very High 8.799447 3.106060

We now need to specify both the id.vars and measure.vars arguments in the melt function to get the desired output:

 > melt(dat, id.vars = "FactorB", measure.vars = c("Group1", "Group2")) FactorB variable value 1 Very Low Group1 6.851828 2 Very Low Group1 7.352169 3 Very Low Group1 6.918091 4 Low Group1 7.402351 5 Low Group1 6.928385 6 Low Group1 7.400626 ... 30 Very High Group2 3.106060

Other useful resources are provided on the Supplementary Material page.