I am amazed by the number of comments I received on my recent blog entry about “by”, “apply” and friends. I had started my post by pointing out that R is a language. Well indeed, I have come to the conclusion, that it is a language with lots of irregular expressions and dialects. It feels a bit like German or French where you have to learn and memorise the different articles. The Germans have three singular definite articles: der (male), die (female) and das (neutral), the French have two: le (male) and la (female). Of course there is no mapping between them, and how do you explain that a girl in German is neutral (das Mädchen), while manhood is female (die Männlichkeit)?
Back to R. As I found out, there are lots of different ways to calculate the means on subsets of data. I begin to wonder, why so many different interfaces and functions have been developed over the years, and also why I didn’t use the
aggregate function more often in the past?
Can we blame internet search engines? Why should I lean a programming language properly, when I can find approximate answers to my problem online. I may not end up with the best answer, but with something which will work after all: Don’t know why, but it works.
And sometimes the help files can be more difficult to understand than the code in the examples. Hence, I end up playing around with the example code until it works, and only then I try to figure out how it works. That was my experience with
Maybe this is a bit harsh. It is always up to the individual to improve his language skills, but you can get drunk in a pub as well, by only being able to order beer. I think it was George Bernard Shaw, who said: “R is the easiest language to speak badly.” No, actually he said: “English is the easiest language to speak badly.” Maybe that explains the success of English and R?
Reading helps. More and more books have been published on R over the last years, and not only in English. But which should you pick? Xi’an’s review on the Art of R Programming suggests that it might be a good start.
aggregate. Has anyone noticed, that the formula interface of
aggregate is different to
aggregate(cbind(Sepal.Width, Petal.Width) ~ Species, data=iris, FUN=mean) Species Sepal.Width Petal.Width 1 setosa 3.428 0.246 2 versicolor 2.770 1.326 3 virginica 2.974 2.026
summaryBy(Sepal.Width + Petal.Width ~ Species, data=iris, FUN=mean) Species Sepal.Width.mean Petal.Width.mean 1 setosa 3.428 0.246 2 versicolor 2.770 1.326 3 virginica 2.974 2.026
And another slightly more complex example:
aggregate(cbind(ncases, ncontrols) ~ alcgp + tobgp, data = esoph, FUN=sum) summaryBy(ncases + ncontrols ~ alcgp + tobgp, data = esoph, FUN=sum)