Floating-point summation and the M1 processor

[This article was first published on Blog: John C. Nash, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

John C. Nash, Retired Professor, University of Ottawa

While I have had little to do with Apple since the early 1980s when they abandoned several products in which a colleague and I had invested time, effort and money, I recently had to take note of the introduction of the M1 chip in some of their products. This was not a choice. I had made some updates to my R package nlsr, then did the appropriate tests and submitted it to CRAN (the Consolidated R Archive Network). The package very smoothly passed the checks and was upgraded on the repository servers as fast as I can remember.

A few days later, however, I received a message containing the text

Please see the problems shown on

Please correct before 2023-02-23 to safely retain your package on CRAN.

There is a check service for M1mac issues: see
https://www.stats.ox.ac.uk/pub/bdr/M1mac/README.txt .

It turned out that the issue was not in my code, but in an example given in a vignette, a form of package documentation useful to present a longer exposition than typical man pages. Moreover, the example did not use any of my own code, but referred to another package.

It was easy to resolve the immediate issue with my package by wrapping the offending example lines with the R try() function to catch the “errors”. However, my curiosity was raised.

Here is what I believe was going on. The particular “error” was that a nonlinear least squares code that called Fortran routines returned a so-called “singular gradient” (actually Jacobian) and failed. On non-M1 platforms, this does not happen, but truthfully, the example problem does present near-singular Jacobians. Because no other platform raised the issue, the example was not protected with try().

IEEE Arithmetic

Prior to the mid 1980s, computers used a frightening variety of different ways of performing arithmetic. Besides the choice of radix – 2 (binary), 10 (decimal), 8 (octal) or 16 (hexadecimal) – we had the possibility of no radix (“decimal”) point for integers, which could be signed or unsigned, we could use fixed point arithmetic, rather like the old mechanical calculators, or we could have floating-point, which is what scientific calculations really need so that different scales can be accommodated.

However, we also had various, sometimes crazy, extensions to offer guard digits, “fuzzy” comparison and a whole lot of other obstacles to the straightforward comparison of results output by different systems.

The Intel 8087 and IEEE 754 standard were parallel developments. I was a rather minor member of the 1985 standardization committee, though an ardent supporter. This introduced what we now more or less expect as the arithmetic of modern computing. Likely for historical reasons, most modern calculations take what we used to call “double precision” as the common form of real numbers.

Summarizing severely, this form, now called binary64 has a mantissa of 53 binary digits, 11 exponent bits and a sign. But that is 65 bits! Yes, because in scientific notation, there is always a non-zero first digit. In binary, this is a 1, so we can assume it is there and get a slightly extended precision “for free”. We make the electronic engineers work hard to deal with an assumed binary digit in their chips.

80 bit extended precision

The Intel 8087 development was influenced by a number of numerical analysts, notably W. Kahan. During the 60s and 70s, we had to deal with a lot of machines that had very limited arithmetic. I did a lot of the development of the codes in my book Compact numerical methods for computers (1979) on a Data General Nova with a 24 bit mantissa and no guard digit. One recognized improvement was the use of a longer precision for an accumulator register for summation. If you have a situation where you need to add up a lot of very small numbers, say a thousand of around 0.001 and two of 10 thousand, but your calculator carries only 6 digits, your answer will be 20000, not 20001. The reason is that 10000.001 cannot be represented. However, if you have an 8-digit accumulator, you get the right answer.

That’s the motivation for extended precision registers. Of course, handling extended accumulators requires extra electrical engineering and extra software to provide access. Two notes:

  • It appears that in the M1 chip, there are no extended registers
  • A lot of software doesn’t bother to provide access anyway.

A little example in R

Consider the following little snippet of R code that adds numbers in a vector one at a time.

loopsum <- function(vec){
n <- length(vec)
for (i in 1:n) { vsum <- vsum + vec[i]}

This forces vector summation to be one element at a time, and the intermediate quantity vsum will be forced into the binary64 representation, so no extended bits. However, internal functions in R can use such extended arithmetic, and the underlying C code seems to use it. Let us try the following example:

small<-.Machine$double.eps/4 # 1/4 of the machine precision
vsmall <- rep(small, 1e4) # a long vector of small numbers
vv1 <- c(1.0, vsmall) # 1 at the front of this vector
vv2 <- c(vsmall, 1.0) # 1 at the end
(sum(vv1) – 1.0e0)
(sum(vv2) - 1.0e0)
(loopsum(vv1) – 1.0e0) # should be zero
(loopsum(vv2) – 1.0e0) # should be greater than zero

Though I don’t have an M1 Macintosh, some very community-minded R workers have provided a test environment at https://mac.r-project.org/macbuilder/submit.html . This is set up to run the R standard ”R CMD check” on a tarball package that is uploaded to the site via the web-page linked. Creating a small package with the example code in a file of type .Rd in the man directory of the package allows testing in this environment. Results are returned as a compressed archive from which the results of examples can be obtained. The results are shown below side by side for comparison. Note particularly the first summation, where the M1 results suggest no extended precision.

      Intel I5 results                     R macOS builder results

> (sum(vv1) - 1.0e0)                > (sum(vv1) - 1.0e0)
[1] 5.551115e-13      <----->       [1] 0
> (sum(vv2) - 1.0e0)                > (sum(vv2) - 1.0e0)
[1] 5.551115e-13                    [1] 5.551115e-13
> (loopsum(vv1) - 1.0e0)            > (loopsum(vv1) - 1.0e0)
[1] 0                               [1] 0
> (loopsum(vv2) - 1.0e0)            > (loopsum(vv2) - 1.0e0)
[1] 5.551115e-13                    [1] 5.551115e-13


Should the above be a worry that we may be getting poor results? Probably not. However, it would be good to have better documentation than I could find easily on the M1 arithmetic. It also makes sense to keep in mind that the M1 may be blindingly fast, but that it will get slightly weaker results on some vector, matrix and signal processing operations than chips that do have extended precision for accumulation.

To leave a comment for the author, please follow the link and comment on their blog: Blog: John C. Nash.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)