RGolf: rolling window

[This article was first published on R snippets, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I have learned a lot from my last RGolf post. Therefore today I have another problem from practice.

You have a data set on values of contracts signed by ten salesmen. It has three columns: person id (p), contract value (v) and time (t).
Here is the code that generates the test data set.

n <- 4000
y <- 10
set.seed(1)
d <- data.frame(p = sample(letters[1:y], n, rep=T),
                v = runif(n),
                t = sample(seq(as.Date(“2000-1-1”),
                           by=”day”, len=n)))

head(d)
#   p          v          t
# 1 c 0.18776846 2001-05-28
# 2 d 0.50475902 2001-05-25
# 3 f 0.02728685 2008-07-06
# 4 j 0.49629785 2004-08-06
# 5 c 0.94735171 2007-10-31
# 6 i 0.38118213 2001-09-04

For each salesmen we want to find the 90-day period in which she generated the highest sum of sales.
The task is to generate a data frame that has three columns: person id (p), maximal sum of sales (v), start of the 90-day period (t) and save it to a variable named r.
In situations where there are multiple such 90-day periods you are to report the date of the earliest contract that was signed during such a period.

The rules of engagement are exactly as last time:
(1) generate data frame r as few keystrokes as possible,
(2) one line of code may not be longer than 80 characters,
(3) the solution must be in base R only (no package loading is allowed).

Here is my attempt consisting of 225 characters:

x=max(d$t);m=matrix(0,x+89,y)
m[cbind(d$t, d$p)]=d$v
s=sapply(1:x,function(z)
colSums(m[z+0:89,]))*t(m>0)[,1:x]
u=apply(s,1,function(z)c(max(z),which.max(z)))
r=data.frame(p=letters[1:y],v=u[1,],t=u[2,]+as.Date(“1970-01-01”))

It produces the following output:

r
#    p         v          t
# 1  a 10.198044 2009-10-11
# 2  b  9.265335 2002-08-28
# 3  c  9.735401 2008-02-21
# 4  d  9.479942 2004-09-07
# 5  e 11.036041 2010-09-16
# 6  f 10.602446 2002-04-04
# 7  g  9.927153 2007-08-11
# 8  h 10.917856 2007-04-26
# 9  i 10.027341 2008-10-26
# 10 j  9.965738 2004-12-28

Here is a more verbose version that is easier to read:

d_o <- d[order(d$t),] # order d by time
testr <- NULL
for (p in levels(d_o$p)) {
    d_o_p <- d_o[d_o$p == p,] # subset ordered d by player
    best_v_sum <- -Inf
    best_i <- NA
    for (i in 1:nrow(d_o_p)) {
        d_o_p_t <- d_o_p[d_o_p$t < d_o_p$t[i] + 90 &
                         d_o_p$t >= d_o_p$t[i],]
        # subset ordered d by player and rolling window
        if (sum(d_o_p_t$v) > best_v_sum) {
            best_v_sum <- sum(d_o_p_t$v)
            best_i <- i
        }
    }
    testr <- rbind(testr, data.frame(p = p, v = best_v_sum,
                   t = d_o_p$t[best_i]))
}

all.equal(r,testr)

# [1] TRUE

Any competing solution should pass all.equal test.

As last time – please post your solutions in comments to the post.

To leave a comment for the author, please follow the link and comment on their blog: R snippets.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)