Sample Entropy with Rcpp

February 6, 2018
By

(This article was first published on schochastics, and kindly contributed to R-bloggers)

Entropy. I still shiver when I hear that word, since I never fully understood that
concept. Today marks the first time I was kind of forced to look into it in more detail.
And by “in detail”, I mean I found a StackOverflow question that had something to do with a
problem I am having (sound familiar?). The problem was is about complexity of time series and
one of the suggested methods was Sample Entropy.

#used packages
library(tidyverse)  # for data wrangling
library(pracma) # for Sample Entropy code
library(Rcpp) # integrate C++ in R

Sample Entropy

Sample entropy is similar to Approximate Entropy
and used for assessing the complexity of time-series. The less “complex” the time series is
the easier it may be to forecast it.

Sample Entropy in R

I found two packages that implement sample entropy,
pracma and nonlinearTimeSeries. I looked into
nonlinearTimeSeries first but the data structure seemed a bit too complex on first glance (for me!).
So I decided to go for pracma. When you are ok with the default parameters, then you can simple call
sample_entropy().

set.seed(1886)
ts <- rnorm(200)
sample_entropy(ts)
## [1] 2.302585

Simple. Problem is, I need to calculate the sample entropy of 150,000 time series. Can the function handle that in reasonable time?

#calculate sample entropy for 500 time series
set.seed(1886)
A <- matrix(runif(500*200),500,200)
system.time(apply(A,1,function(x)sample_entropy(x)))
##    user  system elapsed 
##  40.775   0.004  40.782

This translates to several hours for 150,000 time series, which is kind of not ok.
I would prefer it a little faster.

Sample Entropy with Rcpp

Sample Entropy is actually super easy to implement. So I used my rusty c++ skills
and implemented the function myself with the help of Rcpp.

cppFunction(
  "double SampleEntropy(NumericVector data, int m, double r, int N, double sd)
{
  int Cm = 0, Cm1 = 0;
  double err = 0.0, sum = 0.0;
  
  err = sd * r;
  
  for (unsigned int i = 0; i < N - (m + 1) + 1; i++) {
    for (unsigned int j = i + 1; j < N - (m + 1) + 1; j++) {      
      bool eq = true;
      //m - length series
      for (unsigned int k = 0; k < m; k++) {
        if (std::abs(data[i+k] - data[j+k]) > err) {
          eq = false;
          break;
        }
      }
      if (eq) Cm++;
      
      //m+1 - length series
      int k = m;
      if (eq && std::abs(data[i+k] - data[j+k]) <= err)
        Cm1++;
    }
  }
  
  if (Cm > 0 && Cm1 > 0)
    return std::log((double)Cm / (double)Cm1);
  else
    return 0.0; 
  
}"
)

The code can also be found on github.

Let’s see if it produces the same output as the pracma version.

set.seed(1886)
ts <- rnorm(200)
sample_entropy(ts)
## [1] 2.302585
SampleEntropy(ts,2L,0.2,length(ts),sd(ts))
## [1] 2.302585

Perfect. Now let’s check if we gained some speed up.

system.time(apply(A,1,function(x)SampleEntropy(x,2L,0.2,length(ts),sd(ts))))
##    user  system elapsed 
##   0.084   0.000   0.084

The speed up is actually ridiculous. Remember that the pracma code ran 40 seconds.
The Rcpp code not even a tenth of a second. This is definitely good enough for 150,000
time series.

To leave a comment for the author, please follow the link and comment on their blog: schochastics.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)