A word of warning about grep, which and the like

July 13, 2011
By

(This article was first published on Stat Bandit » R, and kindly contributed to R-bloggers)

I’ve often selected columns or rows of a data frame using grep or which, based on some property. That is inherently sound, but the trouble comes when you wish to remove rows or columns based on that grep or which call, e.g.,

dat <- dat[,-grep('\\.1', names(dat))]

which would remove columns with a .1 in the name. This is fine the first time around, but if you forget and re-run the code, grep('\\.1',names(dat)) gives a vector of length 0, and hence dat becomes a data.frame with 0 columns. The function which also has similar pitfalls, as demonstrated in a recent R-help posting by David Winsemius. I find a more reliable method is to do

dat <- dat[,setdiff(1:ncol(dat),grep('\\.1',names(dat)))]

which will always give the right number of columns. Other suggestions for getting around this issue are welcomed in the comments.


To leave a comment for the author, please follow the link and comment on his blog: Stat Bandit » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.