**Mango Solutions Shop**, and kindly contributed to R-bloggers)

By Jinjing Xie – Consultant, Shanghai.

The challenge here is interesting but also very clear: for a given un-ordered vector of numbers, select from it a longest increasing (or decreasing) subsequence. This subsequence may not be unique.

For example, in the following sequence,

```
62 49 42 54 59 39 35 48 57 60
```

One obvious longest increasing subsequence is 48, 57, 60, which has length 3;

The (unique) longest decreasing subsequence is 62, 49, 42, 39, 35, which has length 5.

For a very long vector, say having 10000 numbers, is it the case that all increasing and decreasing sequences are very short? This question was actually answered 80 years ago, the longest increasing or decreasing subsequence has ( O(sqrt{n}) ) length.

Interested readers can find the proof in Erdos P. and Szekeres G., 1935 for following theorem: any sequence of ( n^2+1 ) distinct integers has either an increasing or a decreasing subsequence of length ( n+1 ).

That means for a general vector of length ( m ), there is either an increasing or decreasing subsequence of length at least ( sqrt{m-1} ).

Once we know that there is always a longest increasing or decreasing subsequence of at least a certain size, how do we find it? At a glance this may seem difficult. Actually there is a good dynamic programming algorithm using only ( O(nlog(n)) ) time.

## R version

The following code has been implemented following Wiki page.

```
library(compiler)
longest_subseq.R = cmpfun(function(x) {
P = integer(length(x))
M = integer(length(x) + 1)
L = newL = 0
for (i in seq_along(x) - 1) {
lo = 1
hi = L
while (lo <= hi) {
mid = (lo + hi)%/%2
if (x[M[mid + 1] + 1] < x[i + 1]) {
lo = mid + 1
} else {
hi = mid - 1
}
}
newL = lo
P[i + 1] = M[newL]
if (newL > L) {
M[newL + 1] = i
L = newL
} else if (x[i + 1] < x[M[newL + 1] + 1]) {
M[newL + 1] = i
}
}
k = M[L + 1]
re = integer(L)
for (i in L:1) {
re[i] = k + 1
k = P[k + 1]
}
re
})
longest_subseq.R(c(5, 1, 3, 2, 4))
```

```
## [1] 2 4 5
```

Now let’s play with a larger vector.

```
x = rnorm(1000)
plot(x, main = "Longest Increasing (blue) and Decreasing (red) Subsequencesnfor 1000 random numbers",
ylab = "Random Value", pch = 16, col = "#0000004D")
# Longest Increasing Sequence
ind = longest_subseq.R(x)
points(ind, x[ind], pch = 16, col = "blue")
# Longest Decreasing Sequence
rind = longest_subseq.R(-x)
points(rind, x[rind], pch = 16, col = "red")
```

How about 250,000 length vectors?

```
y = rnorm(250000)
system.time(ind1 <- longest_subseq.R(y))
```

```
## user system elapsed
## 3.36 0.01 3.48
```

A bit slow, right?

But 250,000 is typically not long for a time series.

## Rcpp version

```
library(Rcpp)
```

```
## Warning: package 'Rcpp' was built under R version 3.0.3
```

`cppcode <- "n#include `n#include nusing namespace std;ntypedef int index_type;nn// [[Rcpp::export]]nvector longest_subseq(SEXP X){n Rcpp::NumericVector x(X);ntvector P(x.size(), 0);ntvector M(x.size()+1, 0);ntindex_type L(0), newL;ntfor(index_type i=0; i < x.size(); ++i) {nttindex_type lo(1), hi(L), mid;nttwhile( lo <= hi) {ntttmid = (lo + hi) / 2;ntttif (x[M[mid]] < x[i] ) {nttttlo = mid + 1;nttt} else {ntttthi = mid - 1;nttt}ntt}nttnewL = lo;nttP[i] = M[newL - 1];nttif (newL > L) {ntttM[newL] = i;ntttL = newL;ntt} else if (x[i] < x[M[newL]]) {ntttM[newL] = i;ntt}nt}n vector re(L);ntindex_type k(M[L]);ntfor(index_type i=L-1; i>=0; --i){nttre[i] = k + 1;nttk = P[k];nt}n return(re);n}"
sourceCpp(code = cppcode)

The `sourceCpp`

will export the cpp function `longest_subseq`

as R function `longest_subseq`

, now we can test it.

```
system.time(ind2 <- longest_subseq(y))
```

```
## user system elapsed
## 0.03 0.00 0.03
```

```
all.equal(ind1, ind2)
```

```
## [1] TRUE
```

That’s a big improvement.

Now we can play with some time series data sets.

```
data(EuStockMarkets)
matplot(EuStockMarkets, main = "Longest Increasing SubsequencenEU Stock Market Data (1991–1998)",
col = 2:5, pch = "o", cex = 0.3)
for (i in 1:4) {
ind = longest_subseq(EuStockMarkets[, i])
points(ind, EuStockMarkets[ind, i], type = "l", col = i + 1)
}
i = 3
ind = longest_subseq(EuStockMarkets[, i])
for (j in ind) lines(c(j, j), c(0, EuStockMarkets[j, i]), col = i + 1, lty = 3)
legend("topleft", legend = colnames(EuStockMarkets), col = 2:5, pch = "o")
```

**leave a comment**for the author, please follow the link and comment on their blog:

**Mango Solutions Shop**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...