(This article was first published on

I have learned a lot from my last RGolf post. Therefore today I have another problem from practice.**R snippets**, and kindly contributed to R-bloggers)You have a data set on values of contracts signed by ten salesmen. It has three columns: person id (p), contract value (v) and time (t).

Here is the code that generates the test data set.

n <- 4000

y <- 10

set.seed(1)

d <- data.frame(p = sample(letters[1:y], n, rep=T),

v = runif(n),

t = sample(seq(as.Date("2000-1-1"),

by="day", len=n)))

head(d)

# p v t

# 1 c 0.18776846 2001-05-28

# 2 d 0.50475902 2001-05-25

# 3 f 0.02728685 2008-07-06

# 4 j 0.49629785 2004-08-06

# 5 c 0.94735171 2007-10-31

# 6 i 0.38118213 2001-09-04

For each salesmen we want to find the 90-day period in which she generated the highest sum of sales.

The task is to generate a data frame that has three columns: person id (p), maximal sum of sales (v), start of the 90-day period (t) and save it to a variable named r.

In situations where there are multiple such 90-day periods you are to report the date of the earliest contract that was signed during such a period.

The rules of engagement are exactly as last time:

(1) generate data frame r as few keystrokes as possible,

(2) one line of code may not be longer than 80 characters,

(3) the solution must be in base R only (no package loading is allowed).

Here is my attempt consisting of 225 characters:

x=max(d$t);m=matrix(0,x+89,y)

m[cbind(d$t, d$p)]=d$v

s=sapply(1:x,function(z)

colSums(m[z+0:89,]))*t(m>0)[,1:x]

u=apply(s,1,function(z)c(max(z),which.max(z)))

r=data.frame(p=letters[1:y],v=u[1,],t=u[2,]+as.Date("1970-01-01"))

It produces the following output:

r

# p v t

# 1 a 10.198044 2009-10-11

# 2 b 9.265335 2002-08-28

# 3 c 9.735401 2008-02-21

# 4 d 9.479942 2004-09-07

# 5 e 11.036041 2010-09-16

# 6 f 10.602446 2002-04-04

# 7 g 9.927153 2007-08-11

# 8 h 10.917856 2007-04-26

# 9 i 10.027341 2008-10-26

# 10 j 9.965738 2004-12-28

Here is a more verbose version that is easier to read:

d_o <- d[order(d$t),] # order d by time

testr <- NULL

for (p in levels(d_o$p)) {

d_o_p <- d_o[d_o$p == p,] # subset ordered d by player

best_v_sum <- -Inf

best_i <- NA

for (i in 1:nrow(d_o_p)) {

d_o_p_t <- d_o_p[d_o_p$t < d_o_p$t[i] + 90 &

d_o_p$t >= d_o_p$t[i],]

# subset ordered d by player and rolling window

if (sum(d_o_p_t$v) > best_v_sum) {

best_v_sum <- sum(d_o_p_t$v)

best_i <- i

}

}

testr <- rbind(testr, data.frame(p = p, v = best_v_sum,

t = d_o_p$t[best_i]))

}

all.equal(r,testr)

# [1] TRUE

Any competing solution should pass all.equal test.

As last time - please post your solutions in comments to the post.

To

**leave a comment**for the author, please follow the link and comment on his blog:**R snippets**.R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...