[This article was first published on The Shape of Code » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I was rummaging around in the source of R looking for trouble, as one does, when I came across what I believed to be a less than optimally accurate floating-point algorithm (function R_pos_di in src/main/arithemtic.c). Analyzing the accuracy of floating-point code is notoriously difficult and those having the required skills tend to concentrate their efforts on what are considered to be important questions. I recently discovered RangeLab, a tool that seemed to be offering painless floating-point code accuracy analysis; here was an opportunity to try it out.

Installation went as smoothly as newly released personal tools usually do (i.e., some minor manual editing of Makefiles and a couple of tweaks to the source to comment out function calls causing link errors {mpfr_random and mpfr_random2}).

RangeLab works by analyzing the flow of values through a program to produce the set of output values and the error bounds on those values. Input values can be specified as a range, e.g., f = [1.0, 10.0] says f can contain any value between 1.0 and 10.0.

My first RangeLab program mimicked the behavior of the existing code in R_pos_di:

n=-10;
f=[1.0, 10.0];

res = 1.0;

if n < 0,
n = -n;
f = 1 / f;
end

while n ~= 0,

if (n / 2)*2 ~= n,
res = res * f;
end
n =  n / 2;
if n ~= 0,
f = f*f;
end
end

and told me that the possible range of values of res was:

res

ans =
float64: [1.000000000000001E-10,1.000000000000000E0]
error: [-2.109423746787798E-15,2.109423746787799E-15]

Changing the code to perform the divide last, rather than first, when the exponent is negative:

n=-10;
f=[1.0, 10.0];

res = 1.0;
is_neg = 0;

if n < 0,
n = -n;
is_neg = 1
end

while n ~= 0,

if (n / 2)*2 ~= n,
res = res * f
end
n =  n / 2;
if n ~= 0,
f = f*f
end

if is_neg == 1, res = 1 / res end
end

and the error in res is now:

res

ans =
float64: [1.000000000000000E-10,1.000000000000000E0]
error: [-1.110223024625156E-16,1.110223024625157E-16]

Yea! My hunch was correct, moving the divide from first to last reduces the error in the result. I have reported this code as a bug in R and wait to see what the R team think.

Was the analysis really that painless? The Rangelab language is somewhat quirky for no obvious reason (e.g., why use ~= when everybody uses != these days and if conditionals must be followed by a character why not use the colon like Python does?) It would be real useful to be able to cut and paste C/C++/etc and only have to make minor changes.

I get the impression that all the effort went into getting the analysis correct and user interface stuff was a very distant second. This is the right approach to take on a research project. For some software to make the leap from interesting research idea to useful tool it is important to pay some attention to the user interface.

The current release does not deserve to be called 1.0 and unless you have an urgent need I would suggest waiting until the usability has been improved (e.g., error messages give some hint about what is wrong and a rough indication of which line the problem occurs on).

RangeLab has shown that there is simpler method of performing useful floating-point error analysis. With some usability improvements RangeLab would be an essential tool for any developer writing code involving floating-point types. 