Creating new data with max values for each subject

December 1, 2014

(This article was first published on Stat Bandit » R, and kindly contributed to R-bloggers)

We have a data set dat with multiple observations per subject. We want to create a subset of this data such that each subject (with ID giving the unique identifier for the subject) contributes the observation where the variable X takes it’s maximum value for that subject.

An R solution

Using the excellent R package dplyr, we can do this using windowing functions included in dplyr. The following solution is available on StackOverflow, by junkka, and gets around the real possibility that multiple observations might have the same maximum value of X by choosing one of them.

dat_max <- dat %>% group_by(ID) %>% filter(row_number(X)==n())

To be a bit more explicit, row_number is a wrapper around rank, using the option ties.method=“first”. So you can use the rank function explicitly here as well.

A solution using the plyr package might be

dat_max <- ddply(dat, .(ID), function(u) u[rank(u$X, ties.method=’first’)==nrow(u),])

A Python solution

In Python, the excellent pandas package allows us to do similar operations. The following example is from this thread on StackOverflow.

import pandas as pd
df = DataFrame({’Sp’:[‘a’,’b‘,’c’,’d‘,’e’,’f’], ‘Mt’:[‘s1’, ‘s1’, ‘s2’,’s2‘,’s2’,’s3’], ‘Value’:[1,2,3,4,5,6], ‘count’:[3,2,5,10,10,6]})
df.iloc[df.groupby([‘Mt’]).apply(lambda x: x[‘count’].idxmax())]

You could also do (from the same thread)

df.sort(‘count’, ascending=False).groupby(‘Mt’, as_index=False).first()

but it is about 50% slower.

To leave a comment for the author, please follow the link and comment on their blog: Stat Bandit » R. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)