Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

By Jinjing Xie – Consultant, Shanghai.

The challenge here is interesting but also very clear: for a given un-ordered vector of numbers, select from it a longest increasing (or decreasing) subsequence. This subsequence may not be unique.

For example, in the following sequence,

`62 49 42 54 59 39 35 48 57 60`

One obvious longest increasing subsequence is 48, 57, 60, which has length 3;

The (unique) longest decreasing subsequence is 62, 49, 42, 39, 35, which has length 5.

For a very long vector, say having 10000 numbers, is it the case that all increasing and decreasing sequences are very short? This question was actually answered 80 years ago, the longest increasing or decreasing subsequence has ( O(sqrt{n}) ) length.

Interested readers can find the proof in Erdos P. and Szekeres G., 1935 for following theorem: any sequence of ( n^2+1 ) distinct integers has either an increasing or a decreasing subsequence of length ( n+1 ).

That means for a general vector of length ( m ), there is either an increasing or decreasing subsequence of length at least ( sqrt{m-1} ).

Once we know that there is always a longest increasing or decreasing subsequence of at least a certain size, how do we find it? At a glance this may seem difficult. Actually there is a good dynamic programming algorithm using only ( O(nlog(n)) ) time.

## R version

The following code has been implemented following Wiki page.

```<span style="color: #0000ff;">library</span>(compiler)
longest_subseq.R = cmpfun(<span style="color: #0000ff;">function</span>(x) {
P = integer(length(x))
M = integer(length(x) + <span style="color: #0000ff;">1</span>)
L = newL = <span style="color: #0000ff;">0</span>
<span style="color: #0000ff;">for</span> (i <span style="color: #0000ff;">in</span> seq_along(x) - <span style="color: #0000ff;">1</span>) {
lo = <span style="color: #0000ff;">1</span>
hi = L
<span style="color: #0000ff;">while</span> (lo <= hi) {
mid = (lo + hi)%/%<span style="color: #0000ff;">2</span>
<span style="color: #0000ff;">if</span> (x[M[mid + <span style="color: #0000ff;">1</span>] + <span style="color: #0000ff;">1</span>] < x[i + <span style="color: #0000ff;">1</span>]) {
lo = mid + <span style="color: #0000ff;">1</span>
} <span style="color: #0000ff;">else</span> {
hi = mid - <span style="color: #0000ff;">1</span>
}
}
newL = lo
P[i + <span style="color: #0000ff;">1</span>] = M[newL]
<span style="color: #0000ff;">if</span> (newL > L) {
M[newL + <span style="color: #0000ff;">1</span>] = i
L = newL
}<span style="color: #0000ff;"> else if</span> (x[i + 1] < x[M[newL + <span style="color: #0000ff;">1</span>] + <span style="color: #0000ff;">1</span>]) {
M[newL + <span style="color: #0000ff;">1</span>] = i
}
}
k = M[L + <span style="color: #0000ff;">1</span>]
re = integer(L)
<span style="color: #0000ff;">for</span> (i <span style="color: #0000ff;">in</span> L:<span style="color: #0000ff;">1</span>) {
re[i] = k + <span style="color: #0000ff;">1</span>
k = P[k + <span style="color: #0000ff;">1</span>]
}
re
})
longest_subseq.R(c(<span style="color: #0000ff;">5</span>, <span style="color: #0000ff;">1</span>, <span style="color: #0000ff;">3</span>, <span style="color: #0000ff;">2</span>, <span style="color: #0000ff;">4</span>))```

`##  2 4 5`

Now let’s play with a larger vector.

```x = rnorm(<span style="color: #0000ff;">1000</span>)
plot(x, main = <span style="color: #008000;">"Longest Increasing (blue) and Decreasing (red) Subsequencesnfor 1000 random numbers"</span>,
ylab = <span style="color: #008000;">"Random Value"</span>, pch = <span style="color: #0000ff;">16</span>, col = "<span style="color: #008000;">#0000004D</span>")
<span style="color: #008000;"># Longest Increasing Sequence</span>
ind = longest_subseq.R(x)
points(ind, x[ind], pch = <span style="color: #0000ff;">16</span>, col = "<span style="color: #008000;">blue</span>")
<span style="color: #008000;"># Longest Decreasing Sequence</span>
rind = longest_subseq.R(-x)
points(rind, x[rind], pch = <span style="color: #0000ff;">16</span>, col = "<span style="color: #008000;">red</span>")``` ```y = rnorm(<span style="color: #0000ff;">250000</span>)
system.time(ind1 <- longest_subseq.R(y))```

```##    user  system elapsed
##    3.36    0.01    3.48```

A bit slow, right?

But 250,000 is typically not long for a time series.

## Rcpp version

`<span style="color: #0000ff;">library</span>(Rcpp)`

`## Warning: package 'Rcpp' was built under R version 3.0.3`

```cppcode <span style="color: #008000;"><- "n#include <vector>n#include <Rcpp.h>nusing namespace std;ntypedef int index_type;nn// [[Rcpp::export]]nvector<index_type> longest_subseq(SEXP X){n    Rcpp::NumericVector x(X);ntvector<index_type> P(x.size(), 0);ntvector<index_type> M(x.size()+1, 0);ntindex_type L(0), newL;ntfor(index_type i=0; i < x.size(); ++i) {nttindex_type lo(1), hi(L), mid;nttwhile( lo <= hi) {ntttmid = (lo + hi) / 2;ntttif (x[M[mid]] < x[i] ) {nttttlo = mid + 1;nttt} else {ntttthi = mid - 1;nttt}ntt}nttnewL = lo;nttP[i] = M[newL - 1];nttif (newL > L) {ntttM[newL] = i;ntttL = newL;ntt} else if (x[i] < x[M[newL]]) {ntttM[newL] = i;ntt}nt}n    vector<index_type> re(L);ntindex_type k(M[L]);ntfor(index_type i=L-1; i>=0; --i){nttre[i] = k + 1;nttk = P[k];nt}n    return(re);n}"
</span>
sourceCpp(code = cppcode)```

The `sourceCpp` will export the cpp function `longest_subseq` as R function `longest_subseq`, now we can test it.

`system.time(ind2 <- longest_subseq(y))`

```##    user  system elapsed
##    0.03    0.00    0.03```

`all.equal(ind1, ind2)`

`##  TRUE`

That’s a big improvement.

Now we can play with some time series data sets.

```data(EuStockMarkets)
matplot(EuStockMarkets, main = <span style="color: #008000;">"Longest Increasing SubsequencenEU Stock Market Data (1991–1998)"</span>,
col = <span style="color: #0000ff;">2:5</span>, pch = "o", cex = <span style="color: #0000ff;">0.3</span>)
<span style="color: #0000ff;">for</span> (i <span style="color: #0000ff;">in 1:4</span>) {
ind = longest_subseq(EuStockMarkets[, i])
points(ind, EuStockMarkets[ind, i], type = <span style="color: #008000;">"l"</span>, col = i + <span style="color: #0000ff;">1</span>)
}
i = <span style="color: #0000ff;">3</span>
ind = longest_subseq(EuStockMarkets[, i])
<span style="color: #0000ff;">for</span> (j <span style="color: #0000ff;">in</span> ind) lines(c(j, j), c(<span style="color: #0000ff;">0</span>, EuStockMarkets[j, i]), col = i + <span style="color: #0000ff;">1</span>, lty = <span style="color: #0000ff;">3</span>)
legend("<span style="color: #008000;">topleft</span>", legend = colnames(EuStockMarkets), col = <span style="color: #0000ff;">2:5</span>, pch = "o")``` 