**The Pith of Performance**, and kindly contributed to R-bloggers)

Here we are in Part III. Wait!? What happened to Parts I and II? Well, I started to write an article about Amdahl’s law, parallelism and prime numbers, but found myself buried three levels deep trying to resolve problems with prime numbers in R. My normal inclination is to use Mathematica for such things, but I happened to already be using R for another reason so, I thought I’d see what it had to offer for calculating with primes. It now looks like that might have been a mistake.

**Putting primes in R**

The first problem is that prime functions are not installed with R base packages by default (because they’re not often used in statistical calculations, I suppose) so, you have to find and install a package that has them. Googling around, I landed on primeFactors in the **R.basic** package, but that failed to install. Grrr!

**Added Note:** On Mon, Jun 14, 2010, David Smith (VP of Marketing, Revolution Analytics) pointed me at the GMP package on CRAN. Although it does have an isprime() tester, there is no prime number generator. Also, since `isprime` returns a 2, 1 or 0 depending on whether the number is, might be, or is not a prime, it’s probably easier to use it with an `ifelse` function:

> isprime(29)

[1] 2

> ifelse(isprime(29)==2,"yes","no")

[1] "yes"

Undaunted, I finally found the package called **schoolmath**, mentioned over at R-bloggers, and it appeared to have just the functions I needed. For example, it has a convenient function called *primes* that lists all the prime numbers within a specified range.

Good. Let’s try listing the primes between 2 and 10:

> library(schoolmath)

> primes(2,10)

[1] 1 2 3 5 7 11 13 17

Huh!? That’s not right. It turns out that this result occurs because the underlying R code says:

if (start < 18) { primzahlen <- c(1, 2, 3, 5, 7, 11, 13, 17)

which is bloody annoying! The other glaring problem is the 1. Who ordered that? (see Muon) The number 1 was once considered prime, but not these days. The usage note states: `primes(start = 12, end = 9999)`. Sigh!

**Counting primes**

Ignoring that strangeness (I can always decrement by 1), it appears to do better if you ask for a bigger range. Let’s look at primes less than 500:

> rsm500 <- primes(2,500)

> rsm500

[1] 1 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71

[22] 73 79 83 89 97 101 103 107 109 113 127 131 133 137 139 149 151 157 163 167 173

[43] 179 181 191 193 197 199 211 223 227 229 233 239 241 251 257 259 263 269 271 277 281

[64] 283 293 301 307 311 313 317 331 337 347 349 353 359 367 373 379 383 389 397 401 409

[85] 419 421 431 433 439 443 449 457 461 463 467 479 487 491 499

>length(rsm500)

[1] 99

Hang on, that’s wrong too! There are only 95 primes less than 500. You can check against this correct list on the web, and here they are from Mathematica:

> mma500

[1] 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73

[22] 79 83 89 97 101 103 107 109 113 127 131 137 139 149 151 157 163 167 173 179 181

[43] 191 193 197 199 211 223 227 229 233 239 241 251 257 263 269 271 277 281 283 293 307

[64] 311 313 317 331 337 347 349 353 359 367 373 379 383 389 397 401 409 419 421 431 433

[85] 439 443 449 457 461 463 467 479 487 491 499

> length(mma500)

[1] 95

Now, I have to find out where the inconsistencies reside.

**Debugging the count**

To find the wrong additional entries, I wrote an R script and here’s the output. The first column is just the row number, which starts at zero (because of the incorrect inclusion of 1 in the R package) so that successive entries line up. The second column (MMA) contains the Mathematica primes and the third column (Rpkg) is the primes produced by the **schoolmath** package.

Comparison with Mathematica list of primes

MMA Rpkg

0 NaN 1 *

1 2 2

2 3 3

3 5 5

4 7 7

5 11 11

6 13 13

7 17 17

8 19 19

9 23 23

10 29 29

11 31 31

12 37 37

13 41 41

14 43 43

15 47 47

16 53 53

17 59 59

18 61 61

19 67 67

20 71 71

21 73 73

22 79 79

23 83 83

24 89 89

25 97 97

26 101 101

27 103 103

28 107 107

29 109 109

30 113 113

31 127 127

32 131 131

33 137 133 *

34 139 137 *

35 149 139 *

36 151 149 *

37 157 151 *

38 163 157 *

39 167 163 *

40 173 167 *

41 179 173 *

42 181 179 *

43 191 181 *

44 193 191 *

45 197 193 *

46 199 197 *

47 211 199 *

48 223 211 *

49 227 223 *

50 229 227 *

51 233 229 *

52 239 233 *

53 241 239 *

54 251 241 *

55 257 251 *

56 263 257 *

57 269 259 *

58 271 263 *

59 277 269 *

60 281 271 *

61 283 277 *

62 293 281 *

63 307 283 *

64 311 293 *

65 313 301 *

66 317 307 *

67 331 311 *

68 337 313 *

69 347 317 *

70 349 331 *

71 353 337 *

72 359 347 *

73 367 349 *

74 373 353 *

75 379 359 *

76 383 367 *

77 389 373 *

78 397 379 *

79 401 383 *

80 409 389 *

81 419 397 *

82 421 401 *

83 431 409 *

84 433 419 *

85 439 421 *

86 443 431 *

87 449 433 *

88 457 439 *

89 461 443 *

90 463 449 *

91 467 457 *

92 479 461 *

93 487 463 *

94 491 467 *

95 499 479 *

96 NaN 487 *

97 NaN 491 *

98 NaN 499 *

Bad R pkg values: 1 133 259 301

The end of this output (scroll down) shows 4 incorrectly included composite numbers (integers that are not prime).

In row 0, Mathematica (correctly) does not produce 1 as a prime, so its value is shown as NaN and an asterisk is appended to that row to indicate a discrepancy. Rows 1 through 32 are consistent with each other. The trouble starts at line 33 where R incorrectly inserts 133 as a prime in its list. We can easily check that 133 is not prime using the **schoolmath** function `is.prim()`:

> is.prim(133)

[1] FALSE

With 4 composites wrongly listed as primes you get a count of 99 instead of the correct 95.

This is rather troubling. My experience with R is that it has surprisingly few bugs. This **schoolmath** package is the worst case of bugs I’ve seen on CRAN. Perhaps I should have been warned by the name of the package, but it was recommended by a mathematician. **Schoolmath** needs to go back to school. I hope the authors will enroll it and soon.

**leave a comment**for the author, please follow the link and comment on his blog:

**The Pith of Performance**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...