RGolf: rolling window

May 30, 2014
By

(This article was first published on R snippets, and kindly contributed to R-bloggers)

I have learned a lot from my last RGolf post. Therefore today I have another problem from practice.

You have a data set on values of contracts signed by ten salesmen. It has three columns: person id (p), contract value (v) and time (t).
Here is the code that generates the test data set.

n <- 4000
y <- 10
set.seed(1)
d <- data.frame(p = sample(letters[1:y], n, rep=T),
                v = runif(n),
                t = sample(seq(as.Date("2000-1-1"),
                           by="day", len=n)))

head(d)
#   p          v          t
# 1 c 0.18776846 2001-05-28
# 2 d 0.50475902 2001-05-25
# 3 f 0.02728685 2008-07-06
# 4 j 0.49629785 2004-08-06
# 5 c 0.94735171 2007-10-31
# 6 i 0.38118213 2001-09-04

For each salesmen we want to find the 90-day period in which she generated the highest sum of sales.
The task is to generate a data frame that has three columns: person id (p), maximal sum of sales (v), start of the 90-day period (t) and save it to a variable named r.
In situations where there are multiple such 90-day periods you are to report the date of the earliest contract that was signed during such a period.

The rules of engagement are exactly as last time:
(1) generate data frame r as few keystrokes as possible,
(2) one line of code may not be longer than 80 characters,
(3) the solution must be in base R only (no package loading is allowed).

Here is my attempt consisting of 225 characters:

x=max(d$t);m=matrix(0,x+89,y)
m[cbind(d$t, d$p)]=d$v
s=sapply(1:x,function(z)
colSums(m[z+0:89,]))*t(m>0)[,1:x]
u=apply(s,1,function(z)c(max(z),which.max(z)))
r=data.frame(p=letters[1:y],v=u[1,],t=u[2,]+as.Date("1970-01-01"))

It produces the following output:

r
#    p         v          t
# 1  a 10.198044 2009-10-11
# 2  b  9.265335 2002-08-28
# 3  c  9.735401 2008-02-21
# 4  d  9.479942 2004-09-07
# 5  e 11.036041 2010-09-16
# 6  f 10.602446 2002-04-04
# 7  g  9.927153 2007-08-11
# 8  h 10.917856 2007-04-26
# 9  i 10.027341 2008-10-26
# 10 j  9.965738 2004-12-28

Here is a more verbose version that is easier to read:

d_o <- d[order(d$t),] # order d by time
testr <- NULL
for (p in levels(d_o$p)) {
    d_o_p <- d_o[d_o$p == p,] # subset ordered d by player
    best_v_sum <- -Inf
    best_i <- NA
    for (i in 1:nrow(d_o_p)) {
        d_o_p_t <- d_o_p[d_o_p$t < d_o_p$t[i] + 90 &
                         d_o_p$t >= d_o_p$t[i],]
        # subset ordered d by player and rolling window
        if (sum(d_o_p_t$v) > best_v_sum) {
            best_v_sum <- sum(d_o_p_t$v)
            best_i <- i
        }
    }
    testr <- rbind(testr, data.frame(p = p, v = best_v_sum,
                   t = d_o_p$t[best_i]))
}

all.equal(r,testr)

# [1] TRUE

Any competing solution should pass all.equal test.

As last time - please post your solutions in comments to the post.

To leave a comment for the author, please follow the link and comment on his blog: R snippets.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.