Data manipulations

August 23, 2011
By

(This article was first published on R Blog, and kindly contributed to R-bloggers)

In the last Utah R Users group meeting I gave a presentation on data manipulations on R, and today I found through the plyr mailing list two commands that I was previously unaware of that should definitely be made mention of.

arrange

I was very pleased to find arrange because it fills the nagging hole for sorting data frames.  Calling

arrange(df, var1, var2)

is much better than calling

df[order(df$var1, df$var2),]

Created by Pretty R at inside-R.org

because it’s understandable by practically anyone, and when your code is understandable there is less chance of mistakes.

mutate

mutate is not that different from transform, but I have to make the confession that when I was setting things up for my presentation I tried to see if transform could do the things that mutate does.  mutate can include previously defined variables in later defined variables.  I quote from the mutate help file,

# Things transform can't do
mutate(airquality, Temp = (Temp - 32) / 1.8, OzT = Ozone / Temp)

Created by Pretty R at inside-R.org

Notice that temp is first defined then used.  Usually when I need to do something like that I resort to using within, but hopefully I will have to do that less now.

To leave a comment for the author, please follow the link and comment on his blog: R Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.