Why RcppDynProg is Written in C++

April 5, 2019
By

(This article was first published on R – Win-Vector Blog, and kindly contributed to R-bloggers)

The (matter of opinion) claim:

“When the use of C++ is very limited and easy to avoid, perhaps it is the best option to do that […]”

(source discussed here)

got me thinking: does our own RcppDynProg package actually use C++ in a significant way? Could/should I port it to C? Am I informed enough to use something as complicated as C++ correctly?

RcppDynProg implements a nifty concise dynamic programming solution to a segmentation problem. It can automatically partition graphs such as the following:

README r1 1

into the following:

README r2 1

(details found here).

But is the package really using C++ in any significant way? The implementation is just the usual sort of index chasing needed to fill in a dynamic programming table. Looking at it superficially, the package is not doing anything deep or really using and C++ libraries in a fundamentally interesting manner.

But then it hit me: the package is indexing into arrays. With native C pointer types we would not have any bounds checking on the indexing. With the C++ classes we get bounds checking. This may seem like a small thing, but it is huge. With C pointer types if you have an out of bounds indexing error when writing a value: you may corrupt memory and that can have fairly unbounded consequences. With C++ an out of bounds indexing error causes an exception, code that executes without exception is then a proof the execution didn’t attempt out of bounds indexing.

So RcppDynProg is using C++ in a significant way: it is using it for safety guarantees on array indexing. R users expect safety guarantees on array indexing, as it is a service R supplies. So an extension package that incorporates index bounds checking can be “more R like.” This simple point makes me think many “doesn’t seem to be using C++ in any deep way” packages are also acquiring deep benefits in using C++.

Are there risks in using something as involved as a combination of R, C++, and Rcpp all at the same time for small new project?

Yes.

But I have tried to mitigate them. I have not used new/delete (used only stack-allocated C++ objects), use reference arguments (to try and minimize object construction/destruction), not defined classes with non-trivial destructors, not knowing called back to R functions (though I am using some Rcpp adapted data structures), and generally tried to stay in a generic tame sub-dialect of C++.

I would be happy to incorporate any polite critiques/improvements of the C++ code (found here). If there is something that is obviously wrong to an expert, I would be happy to move to what is obviously right to the experts. (Frankly the thing that most concerns me is: correctly modeling class lifetime and interaction-with/protection-from R’s garbage collector. I think I coded in a style that allows Rcpp to control these issues correctly, but I may stand to be corrected.)


Note: C++ structures such as NumericVector do in fact index bounds check if you use () notation instead of [] notation. RcppDynProg tries to use () throughout to get the index bounds checking. Below is a quick example of the difference.

library("Rcpp")

f_good <- cppFunction('NumericVector oob(NumericVector x) {
  int n = x.size();
  if(n>0) {
    x(0) = 5.0; // in bounds
  }
  return x;
}')

f_good(c(1, 2))
# [1] 5 2

f_bad1 <- cppFunction('NumericVector oob(NumericVector x) {
  int n = x.size();
  x(n+10) = 5.0; // out of bounds, checked
  return x;
}')

f_bad1(c(1, 2))
# Error in f_bad1(c(1, 2)) : Index out of bounds: [index=12; extent=2].

f_bad2 <- cppFunction('NumericVector oob(NumericVector x) {
  int n = x.size();
  x[n+10] = 5.0; // out of bounds, not checked- memory corruption
  return x;
}')
f_bad2(c(1, 2))
# [1] 1 2
# and R crashes out

To leave a comment for the author, please follow the link and comment on their blog: R – Win-Vector Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)