# Split, Apply, and Combine for ffdf

March 22, 2013
By

(This article was first published on Data and Analysis with R, at Work, and kindly contributed to R-bloggers)

Call me incompetent, but I just can’t get ffdfdply to work with my ffdf dataframes.  I’ve tried repeatedly and it just doesn’t seem to work!  I’ve seen numerous examples on stackoverflow, but maybe I’m applying them incorrectly.  Wanting to do some split-apply-combine on an ffdf, yet again, I finally broke down and made my own function that seems to do the job! It’s still crude, I think, and it will probably break down when there are NA values in the vector that you want to split, but here it is:

mtapply = function (dvar, ivar, funlist) {
lenlist = length(funlist)
outtable = matrix(NA, dim(table(ivar)), lenlist, dimnames=list(names(table(ivar)), funlist))
c = 1
for (f in funlist) {
outtable[,c] = as.matrix(tapply(dvar, ivar, eval(parse(text=f))))
c = c + 1
}
return (outtable)}

As you can see, I’ve made it so that the result is a bunch of tapply vectors inserted into a matrix.  ”dvar”, unsurprisingly, is your dependent variable.  ”ivar”, your independent variable.  ”funlist” is a vector of function names typed in as strings (e.g. c(“median”,”mean”,”mode”).  I’ve wasted so much of my time trying to get ddply or ffdfdply to work on an ffdf, that I’m happy that I now have anything that does the job for me.

Now that I think about it, this will fall short if you ask it to output more than one quantile for each of your split levels.  If you can improve this function, please be my guest!