**Taking the Pith Out of Performance**, and kindly contributed to R-bloggers)

The other day, I stumbled upon the `signif` function in R, so I thought I’d take a look at what it does and compare it with some results discussed in Chap. 3 “Damaging Digits in Capacity Calculations” of my GCaP book, viz., Example 3.5 on page 31. The measured numbers in that example are reproduced here in Table 1 using `read.table` in R.

Number Pdd SD

1 50 0 1

2 0.00341 5 3

3 1.0040 4 5

4 50.0005 4 6

5 6.751 3 4

6 0.157 3 3

7 28.0 1 3

8 40300 0 3

9 0.070 3 2

10 30.07 2 4

11 65000 0 2

12 0.0067 4 2

13 6.023*10^23 3 4

The last column, labeled SD, shows the number of significant digits as determined by applying the following recipe (Algorithm 3.1) from Chap. 3:

Scan the number from left to right.

Is there an explicit decimal point?Yes:

- Scan and locate the first nonzero digit.
- Count it and all digits to its right. (including zeros)
No:

- Append a decimal point.
- Count the position of the last non-zero digit prior to the decimal point.
- All zeros in between should be ignored.

Let’s check this recipe against the number 314000.000314000000 with all those trailing zeros kept intentionally. Since it has a decimal point, the first non-zero digit (scanning from the left) is the ‘3’ that starts the number. Counting it and all the digits to its right, including all zeros, we get SD = 18 significant digits. If the number was just the integer 314000 with no decimal fraction then, according to the second part of Algorithm 3.1, we would get SD = 3, i.e., the ‘4’ is in position 3 and we ignore the three zeros between it and the decimal point.

Applying Algorithm 3.1 to the numbers in Table 1, we get the corresponding values in the SD column. Since that’s a pain to do manually, let’s see what R produces using

`signif()`using the following R code:

sigdigs<-function(n) {

d<-0

while(signif(n,digits=d) != n) {

d<-d+1

next

}

return(d)

}

The results are shown in the column labeled Rsn of Table 2.

Idx Number Pdd SD Rsn

1 50 0 1 0

2 0.00341 5 3 3

3 1.0040 4 5 4

4 50.0005 4 6 6

5 6.751 3 4 4

6 0.157 3 3 3

7 28.0 1 3 2

8 40300 0 3 3

9 0.070 3 2 0

10 30.07 2 4 4

11 65000 0 2 2

12 0.0067 4 2 2

13 602299999999999975882752 0 4 4

**Figure 1**

With this as background, let’s examine the number in row 13 of Table 1, 6.023 × 10

^{23}, which is Avogadro’s number (known in Germany as Loschmidt constant). It’s roughly the number of water molecules in an ice cube. But that number is only a crude approximation, which we learn in chemistry as a simple mnemonic (the two 23s). The current best measured value is written more accurately as:

6.02214179(30) × 10 ^{23}

It’s an inconceivably large number, although by no means the largest number known in physics or mathematics. To get some idea of its size, it’s been said that Intel manufactured on the order of 10^{18} transistors last year. Even at that impressive rate, however, it would take another million years to reach Avogadro’s number of transistors! As written above, you will immediately notice the strange use of parentheses. You never see digits enclosed in parens (in this way) in mathematical numbers, e.g., 3.1415926 representing π as a decimal fraction. That’s because those parens indicate measurement uncertainty or lack of resolution in the 9th and 10th decimal places. It’s analogous to trying to resolve more and more detail in a microscope image. Even at the highest magnification, the smallest details will remain unclear. No such thing occurs in the decimal representation of π. If you want to know the 9th and 10th decimal digits, you simply calculate them; you don’t measure them. The number in row 13 of Table 1 is encoded in scientific notation, which is shorthand for:

6.02214179(30) × 1.00 000 000 000 000 000 000 000

What happens to the significant digits when Avogadro’s number is expressed as an integer? Despite its resemblance to a decimal fraction, it must be an integer since it counts things like molecules. Writing it out in full and ignoring the imprecise digits (for simplicity), it looks like this:

602 214 179 000 000 000 000 000

According to Algorithm 3.1, all the zeros beyond the ‘9’ do not count as significant digits. As shown in Figure 1, they are merely scale zeros, so all is well and SD = 9 in this approximation. But wait! Look at row 13 in Table 2. There, I forced R to express Avogadro’s number as an integer and it produced:

602 299 999 999 999 975 882 752

which is not the same integer we’ve just been discussing. Where did all those non-zero digits come from? And according to Algorithm 3.1, if they’re not zeros they must be significant, in which case SD = 24. Once again, the problem resides in the distinction between numerical precision and measurement precision. The numbers in Row 13 of Tables 1 and 2 are correct to 4 significant digits. In Table 2, however, R didn’t know what to put in the remaining 20 places of Avogadro’s integer (it can’t compute it, like π), so it essentially inserts random junk that rounds up correctly to

602 300 000 000 000 000 000 000

in the chemistry-mnemonic approximation, i.e., SD = 4. Obviously, you can’t get extra precision out of the vacuum, even though R makes it seem like you can.

This is a good reminder that you have to remain vigilant when using sophisticated tools like R.

It’s not that R is wrong, it’s that you might draw the wrong conclusion from R’s output.

We see now that it’s the numerical truncation of trailing zeros that accounts for all the discrepancies between the SD and Rsn in Table 2. To avoid having your measurement zeros truncated or rounded away, those numbers need to be represented as literal strings of characters. In R, this can be accomplished using

`format(number, nsmall)`with

`nsmall`set to the number of digits to the right of the decimal point that you want preserved. That’s the role of the Pdd (post-decimal digits) in Tables 1 and 2. Here’s the R function I wrote to generate Table 2:

sigdigss<-function(n) {

i <- 0

# Check for decimal point is present

if(length(grep("\.", numstr[n])) > 0) { # real number

# Separate integer and fractional parts

intfrac <- unlist(strsplit(numstr[n], "\."))

digstring <- paste(intfrac[1], intfrac[2], sep = "")

numfigs <- nchar(digstring)

while(i < numfigs) {

# Find index of 1st non-zero digit from LEFT

if(substr(digstring,i+1,i+1) == "0") {

i <- i + 1

next

} else {

sigfigs <- numfigs - i

break

}

}

} else { # must be an integer

digstring <- numstr[n]

numfigs <- nchar(digstring)

while(i < numfigs) {

# Find index of 1st non-zero digit from RIGHT

if(substr(digstring, numfigs-i, numfigs-i) == "0") {

i <- i + 1

next

} else {

sigfigs <- numfigs - i

break

}

}

}

return(sigfigs)

}

You can also compare this R code with the corresponding Mathematica and Perl codes.

**leave a comment**for the author, please follow the link and comment on his blog:

**Taking the Pith Out of Performance**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...