Data manipulations

August 23, 2011

(This article was first published on R Blog, and kindly contributed to R-bloggers)

In the last Utah R Users group meeting I gave a presentation on data manipulations on R, and today I found through the plyr mailing list two commands that I was previously unaware of that should definitely be made mention of.


I was very pleased to find arrange because it fills the nagging hole for sorting data frames.  Calling

arrange(df, var1, var2)

is much better than calling

df[order(df$var1, df$var2),]

because it’s understandable by practically anyone, and when your code is understandable there is less chance of mistakes.


mutate is not that different from transform, but I have to make the confession that when I was setting things up for my presentation I tried to see if transform could do the things that mutate does.  mutate can include previously defined variables in later defined variables.  I quote from the mutate help file,

# Things transform can't do
mutate(airquality, Temp = (Temp - 32) / 1.8, OzT = Ozone / Temp)

Notice that temp is first defined then used.  Usually when I need to do something like that I resort to using within, but hopefully I will have to do that less now.

To leave a comment for the author, please follow the link and comment on their blog: R Blog. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)