(This article was first published on

Recently I have calculated Banzhaf power index. I required generation of all subsets of a given set. The code given there was a bit complex and I have decided to write a simple function calculating it. As an example of its application I reproduce Figure 3.5 from Hastie et al. (2009).**R snippets**, and kindly contributed to R-bloggers)The figure shows RSS for all possible linear regressions for prostate cancer data on training subset. The standard approach for such a problem in R is to use leaps package, but I simply wanted to test my function generating all subsets of the set.

Here is the code with all.subsets function generating all subsets and its application to prostate cancer data:

library

**(**plyr**)**library

**(**ggplot2**)**all.subsets

**<-****function****(**set**)****{** n

**<-**length**(**set**)** bin

**<-**expand.grid**(**rlply**(**n, c**(**F, T**)))** mlply

**(**bin,**function****(**...**)****{**set**[**c**(**...**)]****})****}**

file.url

**<-**"http://www-stat.stanford.edu/~tibs/ElemStatLearn/datasets/prostate.data"data.set

**<-**read.table**(**file.url, sep**=**"\t", header**=****TRUE****)**varlist

**<-**all.subsets**(**names**(**data.set**)[**2**:**9**])**get.reg

**<-****function****(**vars**)****{****if**

**(**length

**(**vars

**)**

**==**0

**)**

**{**

vars

**=**"1"**}**

vars.form

**<-**paste**(**"lpsa ~", paste**(**vars, collapse**=**" + "**))** lm

**(**vars.form, data**=**data.set, subset**=**train**)****}**

models

**<-**llply**(**varlist, get.reg**)**models.RSS

**<-**ldply**(**models,**function****(**x**)****{** c

**(**RSS**=**sum**(**x**$**residuals**^**2**)**, k**=**length**(**x**$**coeff**))****})**min.models

**<-**ddply**(**models.RSS, .**(**k**)**,**function****(**x**)****{** x

**[**which.min**(**x**$**RSS**)**,**]})**qplot

**(**k, RSS, data**=**models.RSS, ylim**=**c**(**0,100**)**, xlab

**=**"Subset Size k", ylab**=**"ResidualSum-of-Squares",**)****+**geom_point

**(**data**=**min.models, aes**(**x**=**k, y**=**RSS**)**, colour**=**"red"**)****+**geom_line

**(**data**=**min.models, aes**(**x**=**k, y**=**RSS**)**, colour**=**"red"**)****+**theme_bw

**()**And here is the plot it generates:

To

**leave a comment**for the author, please follow the link and comment on his blog:**R snippets**.R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...