Some new functions I’ve discovered in R

January 29, 2012

(This article was first published on Drunks&Lampposts » R, and kindly contributed to R-bloggers)

I’ve been writing a fair amount of R recently and have been going through a good learning period, here are some functions that I’ve discovered (mainly plyr and reshape related) and thought I would share:

merge_all is a good way to merge multiple different data frames, rather than multiple merge commands. The key thing is to put the dataframes to merge within a list – e.g. merge_all(list(df1, df2, df3), by=”key”).

mutate is a good data manipulation function which is similar to transform (both make for much cleaner code when creating a number of variables within a data frame. The key difference is the iterative nature of mutate – earlier variables that are created can be used in later variables.

So, whilst transform(data.frame, variablex = 5, variabley = variablex +1) won’t work, mutate(data.frame, variablex = 5, variabley = variabley +1) will work.

colwise is a good function for data aggregation when working with wide files. For example, colwise(mean)(data.frame) will return the average of each column in data.frame (there are other ways of doing this, but this makes for quite nice syntax. This example only works if all columns in the dataframe are numeric. To get around this, there are two options – use either numcolwise or colwise(data.frame, is.numeric) – both accomplish exactly the same purpose of subsetting the dataframe before applying the function.

I’m still getting my head around Higher Order Functions in R (John Myles White has a very good intro to these here) and how to use them, but them seem to be like a nice way of writing easy to understand and elegant code:

small.even.numbers <- Filter(function (x) {x %% 2 == 0}, 1:10)
my.sum <- function (x) {Reduce(`+`, x)}

To leave a comment for the author, please follow the link and comment on their blog: Drunks&Lampposts » R. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)