Sample Entropy with Rcpp

[This article was first published on schochastics, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Entropy. I still shiver when I hear that word, since I never fully understood that concept. Today marks the first time I was kind of forced to look into it in more detail. And by “in detail”, I mean I found a StackOverflow question that had something to do with a problem I am having (sound familiar?). The problem was is about complexity of time series and one of the suggested methods was Sample Entropy.

#used packages
library(tidyverse)  # for data wrangling
library(pracma) # for Sample Entropy code
library(Rcpp) # integrate C++ in R

Sample Entropy

Sample entropy is similar to Approximate Entropy and used for assessing the complexity of time-series. The less “complex” the time series is the easier it may be to forecast it.

Sample Entropy in R

I found two packages that implement sample entropy, pracma and nonlinearTimeSeries. I looked into nonlinearTimeSeries first but the data structure seemed a bit too complex on first glance (for me!). So I decided to go for pracma. When you are ok with the default parameters, then you can simple call sample_entropy().

ts <- rnorm(200)
## [1] 2.302585

Simple. Problem is, I need to calculate the sample entropy of 150,000 time series. Can the function handle that in reasonable time?

#calculate sample entropy for 500 time series
A <- matrix(runif(500*200),500,200)
##    user  system elapsed 
##  40.775   0.004  40.782

This translates to several hours for 150,000 time series, which is kind of not ok. I would prefer it a little faster.

Sample Entropy with Rcpp

Sample Entropy is actually super easy to implement. So I used my rusty c++ skills and implemented the function myself with the help of Rcpp.

  "double SampleEntropy(NumericVector data, int m, double r, int N, double sd)
  int Cm = 0, Cm1 = 0;
  double err = 0.0, sum = 0.0;
  err = sd * r;
  for (unsigned int i = 0; i < N - (m + 1) + 1; i++) {
    for (unsigned int j = i + 1; j < N - (m + 1) + 1; j++) {      
      bool eq = true;
      //m - length series
      for (unsigned int k = 0; k < m; k++) {
        if (std::abs(data[i+k] - data[j+k]) > err) {
          eq = false;
      if (eq) Cm++;
      //m+1 - length series
      int k = m;
      if (eq && std::abs(data[i+k] - data[j+k]) <= err)
  if (Cm > 0 && Cm1 > 0)
    return std::log((double)Cm / (double)Cm1);
    return 0.0; 

The code can also be found on github.

Let’s see if it produces the same output as the pracma version.

ts <- rnorm(200)
## [1] 2.302585
## [1] 2.302585

Perfect. Now let’s check if we gained some speed up.

##    user  system elapsed 
##   0.084   0.000   0.084

The speed up is actually ridiculous. Remember that the pracma code ran 40 seconds. The Rcpp code not even a tenth of a second. This is definitely good enough for 150,000 time series.

To leave a comment for the author, please follow the link and comment on their blog: schochastics. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)