Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

If your native code takes more than a few seconds to finish, it is a nice courtesy to the user to check for user interrupts (Ctrl-C) once in a while, say, every 1,000 or 1,000,000 iteration. The C-level API of R provides R_CheckUserInterrupt() for this (see 'Writing R Extensions' for more information on this function). Here's what the code would typically look like:

for (int ii = 0; ii < n; ii++) {
/* Some computational expensive code */
if (ii % 1000 == 0) R_CheckUserInterrupt()
}


This uses the modulo operator % and tests when it is zero, which happens every 1,000 iteration. When this occurs, it calls R_CheckUserInterrupt(), which will interrupt the processing and “return to R” whenever an interrupt is detected.

Interestingly, it turns out that, it is significantly faster to do this check every k=2m iteration, e.g. instead of doing it every 1,000 iteration, it is faster to do it every 1,024 iteration. Similarly, instead of, say, doing it every 1,000,000 iteration, do it every 1,048,576 - not one less (1,048,575) or one more (1,048,577). The difference is so large that it is even 2-3 times faster to call R_CheckUserInterrupt() every 256 iteration rather than, say, every 1,000,000 iteration, which at least to me was a bit counter intuitive the first time I observed it.

Below are some benchmark statistics supporting the claim that testing / calculating ii % k == 0 is faster for k=2m (blue) than for other choices of k (red).

Note that the times are on the log scale (the results are also tabulated at the end of this post). Now, will it make a big difference to the overall performance of your code if you choose, say, 1,048,576 instead of 1,000,000? Probably not, but on the other hand, it does not hurt to pick an interval that is a 2m integer. This observation may also be useful in algorithms that make lots of use of the modulo operator.

So why is ii % k == 0 a faster test when k=2m? I can only speculate. For instance, the integer 2m is a binary number with all bits but one set to zero. It might be that this is faster to test for than other bit patterns, but I don't know if this is because of how the native code is optimized by the compiler and/or if it goes down to the hardware/CPU level. I'd be interested in feedback and hear your thoughts on this.

## Details on how the benchmarking was done

I used the inline package to generate a set of C-level functions with varying interrupt intervals k. I'm not passing k as a parameter to these functions. Instead, I use it as a constant value so that the compiler can optimize as far as possible, but also in order to imitate how most code is written. This is why I generate multiple C functions. I benchmarked across a wide range of interval choices using the microbenchmark package. The C functions (with corresponding R functions calling them) and the corresponding benchmark expressions to be called were generated as follows:

## The interrupt intervals to benchmark
## (a) Classical values
ks <- c(1, 10, 100, 1000, 10e3, 100e3, 1e6)
## (b) 2^k values and the ones before and after
ms <- c(2, 5, 8, 10, 16, 20)
as <- c(-1, 0, +1) + rep(2^ms, each=3)

## List of unevaluated expressions to benchmark
mbexpr <- list()

for (k in sort(c(ks, as))) {
name <- sprintf("every_%d", k)

## The C function
assign(name, inline::cfunction(c(length="integer"), body=sprintf("
int i, n = asInteger(length);
for (i=0; i < n; i++) {
if (i %% %d == 0) R_CheckUserInterrupt();
}
return ScalarInteger(n);
", k)))

## The corresponding expression to benchmark
mbexpr <- c(mbexpr, substitute(every(n), list(every=as.symbol(name))))
}


The actual benchmarking of the 25 cases was then done by calling:

n <- 10e6  ## Number of iterations
stats <- microbenchmark::microbenchmark(list=mbexpr)

expr min lq mean median uq max
every_1(n) 174.05 178.77 184.68 180.76 183.97 262.69
every_3(n) 66.78 69.16 72.10 70.20 72.42 114.75
every_4(n) 53.80 55.31 56.98 56.32 57.26 69.71
every_5(n) 46.17 47.52 49.42 48.83 49.99 66.98
every_10(n) 33.31 34.32 36.58 35.12 36.66 54.83
every_31(n) 23.78 24.45 25.74 25.10 25.83 58.10
every_32(n) 17.81 18.25 18.91 18.82 19.22 25.25
every_33(n) 22.90 23.58 24.90 24.59 25.26 34.45
every_100(n) 18.14 18.55 19.47 19.15 19.63 27.42
every_255(n) 19.96 20.56 21.67 21.16 21.98 42.53
every_256(n) 7.07 7.18 7.54 7.40 7.63 10.73
every_257(n) 19.32 19.72 20.60 20.36 20.85 29.66
every_1000(n) 16.37 16.98 17.81 17.53 18.08 24.24
every_1023(n) 19.54 20.16 20.94 20.50 21.25 28.20
every_1024(n) 6.32 6.40 6.81 6.60 6.83 13.32
every_1025(n) 18.58 19.05 19.91 19.74 20.08 30.51
every_10000(n) 15.92 16.76 17.40 17.38 17.82 24.10
every_65535(n) 18.92 19.60 20.41 20.10 20.80 27.69
every_65536(n) 6.08 6.16 6.62 6.39 6.57 13.40
every_65537(n) 22.08 22.70 23.79 23.69 24.35 31.57
every_100000(n) 16.16 16.55 17.20 17.05 17.61 24.54
every_1000000(n) 16.02 16.42 17.17 16.85 17.42 21.84
every_1048575(n) 18.88 19.23 20.27 19.85 20.52 30.21
every_1048576(n) 6.08 6.18 6.53 6.47 6.58 12.64
every_1048577(n) 22.88 23.23 24.28 23.83 24.63 31.84

I get similar results across various operating systems (Windows, OS X and Linux) all using GNU Compiler Collection (GCC).

> path