Basic R: rows that contain the maximum value of a variable

February 12, 2013

(This article was first published on What You're Doing Is Rather Desperate » R, and kindly contributed to R-bloggers)

File under “I keep forgetting how to do this basic, frequently-required task, so I’m writing it down here.”

Let’s create a data frame which contains five variables, vars, named A – E, each of which appears twice, along with some measurements:

df.orig <- data.frame(vars = rep(LETTERS[1:5], 2), obs1 = c(1:10), obs2 = c(11:20))
#    vars obs1 obs2
# 1     A    1   11
# 2     B    2   12
# 3     C    3   13
# 4     D    4   14
# 5     E    5   15
# 6     A    6   16
# 7     B    7   17
# 8     C    8   18
# 9     D    9   19
# 10    E   10   20

Now, let’s say we want only the rows that contain the maximum values of obs1 for A – E. In bioinformatics, for example, we might be interested in selecting the microarray probeset with the highest sample variance from multiple probesets per gene. The answer is obvious in this trivial example (6 – 10), but one procedure looks like this:

# use aggregate to create new data frame with the maxima
df.agg <- aggregate(obs1 ~ vars, df.orig, max)
# then simply merge with the original
df.max <- merge(df.agg, df.orig)
#   vars obs1 obs2
# 1    A    6   16
# 2    B    7   17
# 3    C    8   18
# 4    D    9   19
# 5    E   10   20

This also works using min() and, I guess, using any function that returns a single value per variable mapping to a value in the original data frame.

With thanks to this mailing list thread.

Filed under: programming, R, research diary, statistics

To leave a comment for the author, please follow the link and comment on their blog: What You're Doing Is Rather Desperate » R. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)