Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I have learned a lot from my last RGolf post. Therefore today I have another problem from practice.

You have a data set on values of contracts signed by ten salesmen. It has three columns: person id (p), contract value (v) and time (t).
Here is the code that generates the test data set.

n <- 4000
y <- 10
set.seed(1)
d <- data.frame(p = sample(letters[1:y], n, rep=T),
v = runif(n),
t = sample(seq(as.Date(“2000-1-1”),
by=”day”, len=n)))

#   p          v          t
# 1 c 0.18776846 2001-05-28
# 2 d 0.50475902 2001-05-25
# 3 f 0.02728685 2008-07-06
# 4 j 0.49629785 2004-08-06
# 5 c 0.94735171 2007-10-31
# 6 i 0.38118213 2001-09-04

For each salesmen we want to find the 90-day period in which she generated the highest sum of sales.
The task is to generate a data frame that has three columns: person id (p), maximal sum of sales (v), start of the 90-day period (t) and save it to a variable named r.
In situations where there are multiple such 90-day periods you are to report the date of the earliest contract that was signed during such a period.

The rules of engagement are exactly as last time:
(1) generate data frame r as few keystrokes as possible,
(2) one line of code may not be longer than 80 characters,
(3) the solution must be in base R only (no package loading is allowed).

Here is my attempt consisting of 225 characters:

x=max(d\$t);m=matrix(0,x+89,y)
m[cbind(d\$t, d\$p)]=d\$v
s=sapply(1:x,function(z)
colSums(m[z+0:89,]))*t(m>0)[,1:x]
u=apply(s,1,function(z)c(max(z),which.max(z)))
r=data.frame(p=letters[1:y],v=u[1,],t=u[2,]+as.Date(“1970-01-01”))

It produces the following output:

r
#    p         v          t
# 1  a 10.198044 2009-10-11
# 2  b  9.265335 2002-08-28
# 3  c  9.735401 2008-02-21
# 4  d  9.479942 2004-09-07
# 5  e 11.036041 2010-09-16
# 6  f 10.602446 2002-04-04
# 7  g  9.927153 2007-08-11
# 8  h 10.917856 2007-04-26
# 9  i 10.027341 2008-10-26
# 10 j  9.965738 2004-12-28

Here is a more verbose version that is easier to read:

d_o <- d[order(d\$t),] # order d by time
testr <- NULL
for (p in levels(d_o\$p)) {
d_o_p <- d_o[d_o\$p == p,] # subset ordered d by player
best_v_sum <- -Inf
best_i <- NA
for (i in 1:nrow(d_o_p)) {
d_o_p_t <- d_o_p[d_o_p\$t < d_o_p\$t[i] + 90 &
d_o_p\$t >= d_o_p\$t[i],]
# subset ordered d by player and rolling window
if (sum(d_o_p_t\$v) > best_v_sum) {
best_v_sum <- sum(d_o_p_t\$v)
best_i <- i
}
}
testr <- rbind(testr, data.frame(p = p, v = best_v_sum,
t = d_o_p\$t[best_i]))
}

all.equal(r,testr)

#  TRUE

Any competing solution should pass all.equal test.