Data manipulations

[This article was first published on R Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In the last Utah R Users group meeting I gave a presentation on data manipulations on R, and today I found through the plyr mailing list two commands that I was previously unaware of that should definitely be made mention of.


I was very pleased to find arrange because it fills the nagging hole for sorting data frames.  Calling

arrange(df, var1, var2)

is much better than calling

df[order(df$var1, df$var2),]

because it’s understandable by practically anyone, and when your code is understandable there is less chance of mistakes.


mutate is not that different from transform, but I have to make the confession that when I was setting things up for my presentation I tried to see if transform could do the things that mutate does.  mutate can include previously defined variables in later defined variables.  I quote from the mutate help file,

# Things transform can't do
mutate(airquality, Temp = (Temp - 32) / 1.8, OzT = Ozone / Temp)

Notice that temp is first defined then used.  Usually when I need to do something like that I resort to using within, but hopefully I will have to do that less now.

To leave a comment for the author, please follow the link and comment on their blog: R Blog. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)