Good riddance to Excel pivot tables

January 30, 2011
By

(This article was first published on Recology, and kindly contributed to R-bloggers)

Excel pivot tables have been how I have reorganized data...up until now. These are just a couple of examples why R is superior to Excel for reorganizing data:


################ Good riddance to pivot tables ############
library(reshape2)
library(plyr)
 
dataset <- data.frame(var1 = rep(c("a","b","c","d","e","f"), each = 4),
var2 = rep(c("level1","level1","level2","level2"), 6),
var3 = rep(c("h","m"), 12), meas = rep(1:12))
Created by Pretty R at inside-R.org


# simply pivot table
cast(dataset, var1 ~ var2 + var3)
Using meas as value column. Use the value argument to cast to override this choice
var1 level1_h level1_m level2_h level2_m
1 a 1 2 3 4
2 b 5 6 7 8
3 c 9 10 11 12
4 d 1 2 3 4
5 e 5 6 7 8
6 f 9 10 11 12
 
# mean by var1 and var2
cast(dataset, var1 ~ var2, mean)
Using meas as value column.  Use the value argument to cast to override this choice
var1 level1 level2
1 a 1.5 3.5
2 b 5.5 7.5
3 c 9.5 11.5
4 d 1.5 3.5
5 e 5.5 7.5
6 f 9.5 11.5

 
# mean by var1 and var3
cast(dataset, var1 ~ var3, mean)
Using meas as value column. Use the value argument to cast to override this choice
var1 h m
1 a 2 3
2 b 6 7
3 c 10 11
4 d 2 3
5 e 6 7
6 f 10 11
 
# mean by var1, var2 and var3 (version 1)
cast(dataset, var1 ~ var2 + var3, mean)
Using meas as value column.  Use the value argument to cast to override this choice
var1 level1_h level1_m level2_h level2_m
1 a 1 2 3 4
2 b 5 6 7 8
3 c 9 10 11 12
4 d 1 2 3 4
5 e 5 6 7 8
6 f 9 10 11 12
 
# mean by var1, var2 and var3 (version 2)
cast(dataset, var1 + var2 ~ var3, mean)
Using meas as value column.  Use the value argument to cast to override this choice
var1 var2 h m
1 a level1 1 2
2 a level2 3 4
3 b level1 5 6
4 b level2 7 8
5 c level1 9 10
6 c level2 11 12
7 d level1 1 2
8 d level2 3 4
9 e level1 5 6
10 e level2 7 8
11 f level1 9 10
12 f level2 11 12

 
# use package plyr to create flexible data frames...
dataset_plyr <- ddply(dataset, .(var1, var2), summarise,
mean = mean(meas),
se = sd(meas),
CV = sd(meas)/mean(meas)
)
> dataset_plyr
var1 var2 mean se CV
1 a level1 1.5 0.7071068 0.47140452
2 a level2 3.5 0.7071068 0.20203051
3 b level1 5.5 0.7071068 0.12856487
4 b level2 7.5 0.7071068 0.09428090
5 c level1 9.5 0.7071068 0.07443229
6 c level2 11.5 0.7071068 0.06148755
7 d level1 1.5 0.7071068 0.47140452
8 d level2 3.5 0.7071068 0.20203051
9 e level1 5.5 0.7071068 0.12856487
10 e level2 7.5 0.7071068 0.09428090
11 f level1 9.5 0.7071068 0.07443229
12 f level2 11.5 0.7071068 0.06148755
 
# ...to use for plotting
qplot(var1, mean, colour = var2, size = CV, data = dataset_plyr, geom = "point")



Created by Pretty R at inside-R.org

To leave a comment for the author, please follow the link and comment on his blog: Recology.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , ,

Comments are closed.