May 6, 2013

The reorder function, in R 3.0.0, is behaving strangely (or I’m really not understanding something).  Take the following simple data frame:

df = data.frame(a1 = c(4,1,1,3,2,4,2), a2 = c(“h”,”j”,”j”,”e”,”c”,”h”,”c”))

I expect that if I call the reorder function on the a2 vector, using the a1 vector as the vector to order the second one by, then any summary stats that I run on the a2 vector will be ordered according to the numbers in a1.  However, look what happens:

table(reorder(df$a2, df$a1))
c e h j 
2 1 2 2

I found out that in order to get it in the order specified by the numbers in the first vector, the following code seems to work:

df$a2 = factor(df$a2, levels=unique(df$a2)[order(unique(df$a1))], ordered=TRUE)

Now look at the result:


j c e h 
2 2 1 2

One thing I notice here is that R seems to be keeping the factor levels alphabetically organized. When I specify the levels by using the “unique” function, it allows itself to break the alphabetic organization.

Why won’t the reorder function work in this case?

