Brute force approach to solving a problem is usually the way of running through all possible combinations and repeating a certain calculation. Until a more elegant solution is devised – for example finding a numerical formula – brute forcing is the only tool at hand. Provided that the nature of the calculation is not unfeasibly inefficient and the universe of combinations / possibilities to search through is not unfeasibly large, brute forcing may be sufficient. However, in many instances, the brute force algorithm can be enhanced so that it bears more “elegance” and “intelligence” to some extent.
The methods to enhance a brute force algorithm may be to restrict the set of values to be searched, to rewrite the code in order to choose more efficient ways that run with fewer instructions at the hardware level or to incorporate multicore or GPU parallelism.
In this post, I will illustrate three “brute force” methods to solve a problem, enhancing in each successive method so that we experience a performance gain of more than 100x times in the third method as compared to the first one. A knowledge of the low level mechanics of the R interpreter is important in selecting the optimal method. The problem I choose is the 124th problem in Project Euler. Since Project Euler requests participants not to disclose the solutions publicly, I will just share portions of the algorithm in order to emphasize the efficiency gains from choosing the right tools and methods:
The radical of n, rad(n), is the product of the distinct prime factors of n. For example, 504 = 23 × 32 × 7, so rad(504) = 2 × 3 × 7 = 42.
If we calculate rad(n) for 1 ≤ n ≤ 10, then sort them on rad(n), and sorting on n if the radical values are equal, we get:
1 1 1 1 1 2 2 2 2 2 3 3 4 2 3 4 2 8 2 4 5 5 3 3 5 6 6 9 3 6 7 7 5 5 7 8 2 6 6 8 9 3 7 7 9 10 10 10 10 10
Let E(k) be the kth element in the sorted n column; for example, E(4) = 8 and E(6) = 9.
If rad(n) is sorted for 1 ≤ n ≤ 100000, find E(10000).
So the main point is to calculate the product of the distinct or unique prime factors of each number up to 1e5. Prime sieving is the trivial part, since it is a common task in many Project Euler problems. Although I have my own sieve algorithms, I personally prefer to use the “Primes” function in “numbers” package. Another method is making a system call to open-source “primesieve” tool. The last part (ordering and getting the number at certain rank) is also trivial.
The common method in all three versions is to check the divisibility of all numbers <= 1e5 with sieved primes. In each version, we track a vector of boolean values with a length of 1e5 to record divisibility and a vector of 1’s with a length of 1e5 to track multiplied distinct factors (in order to yield radicals).
In the naivest first version, we check the divisibility of each number <= 1e5 with a certain prime by the modulo operator “%%”. Note that the numbers vector will be probed against all primes below 1e5:
# whether numbers are divisible by the prime factor, by modulo bool.new <- nums %% prime == 0 # update the radicals vector with the prime factor radicals[bool.new] <<- radicals[bool.new] * prime
In this code snippet, “nums” is the global vector holding the sequence of numbers 1 to 1e5. Although this option is the “brutest” of the three, it still enjoys the efficiency benefit of indexing a vector with another boolean vector of the same length – which takes linear time – instead of searching for the numbers. As long as the vector is not too large and the numbers to be indexed are not too sparse, we may not need to have an efficient hash function without collision.
The performance of the first version that runs the above snippet for radical updating is 12.8 seconds:
> microbenchmark(ordered.radicals.3(), times = 10) Unit: seconds expr min lq mean median uq max ordered.radicals.3() 12.61865 12.64935 12.80029 12.75633 12.85513 13.39739 neval 10
However, we do not need to probe the divisibility of the numbers in the sequence with the prime using the modulo operator “%%”. We know that the numbers divisible by a prime are the multiples of that prime! Using the vector indexing approach, we can get a sequence of the multiples of the prime upto 1e5, and toggle those indices in a clean boolean vector of FALSE values to TRUE, yielding the same result in a much faster manner:
# empty F vector bool.new <- bool.new1 # the indices that are divisible by the prime, as a sequence, without modulo divisible.inds <- (1:floor(lim / prime)) * prime # whether numbers are divisible by the prime factor bool.new[divisible.inds] <- T # update the radicals vector with the prime factor radicals[bool.new] <<- radicals[bool.new] * prime
In the above snippet, first of all an empty boolean vector of 1e5 FALSE values (bool.new) is copied from a template vector (bool.new1) in line 2. In the most critical part, a sequence of the multiples of the prime below 1e5 is formed (line 5). And then the boolean vector of FALSE values is indexed with the multiples and those values are toggled to TRUE (line 8). The last line for updating the radicals is the same line in the first option (line 11).
The performance of this second option is 3 seconds, recording more than 4x gain on top of the first option:
> microbenchmark(ordered.radicals.2(), times = 10) Unit: seconds expr min lq mean median uq max ordered.radicals.2() 2.933848 2.982204 3.022639 3.016654 3.066471 3.118391 neval 10
However, this performance is not good enough. There are 9592 primes below 1e5. And both codes check for all those primes. To enhance the performance we should first review sieving algorithms and reflect on the problem at hand:
A number which is not prime is a composite number: It has at least two prime factors. A composite number must have at least one prime factor less than or equal to its square root. We can arrive at this conclusion by induction: If the smallest prime factor is larger than the square root, the other factor(s) must be larger further. However, this is a contradiction since the product of those factors would be larger than the number itself then. In prime sieving up to some limit, the prime factors <= the square root of the limit are probed. Following this logic, we can deduce that a composite number can have at most one distinct prime factor larger than its square root. Using the same reasoning, we can probe the primes <= sqrt(1e5). If we divide all numbers <= 1e5 with all powers of the primes <= sqrt(1e5) less than 1e5 and are divisible by the numbers, we will have the last remaining prime factors >= sqrt(1e5) for all numbers (and for the case of prime numbers, we will have the primes themselves). Square root of 1e5 is 316 and there are only 65 primes less than 316. So instead of 9592 primes, we will probe for only 65 primes. The maximum power of a prime that is below 1e5 is 16 for the case of prime number 2. So the gains from probing 147 times fewer primes justify the extra work for probing powers of those prime factors.
# empty F vector bool.new <- bool.new1 # empty vector of 1's radicals.new <<- radicals.new1 # maximum power of the prime to be checked max.power <- floor(log(lim, prime)) # the numbers whic are divisible by the prime divisible.inds <- (1:floor(lim / prime)) * prime # whether numbers are divisible by the prime factor bool.new[divisible.inds] <- T # update the radicals vector with the prime factor radicals[bool.new] <<- radicals[bool.new] * prime # find the maximum power of the prime that all numbers are divisible with sapply(1:max.power, update.radicals, prime, lim) # divide all numbers with the maximum power of the prime if it is divisible nums <<- nums / radicals.new
In this last version, we have an additional data structure: an interim vector of 1’s (radicals.new) to collect the max powers of a prime that numbers are divisible with (line 5). We get the maximum power of the the prime that is below 1e5 using logarithm (line 8). After updating the radicals as before, we repeat the procedure for each power of the prime to populate the interim vector of prime powers (radicals.new) using a ply formula (line 20). After the vector is populated, numbers vector is divided by this prime powers vector (line 23).
After all 65 primes are probed and numbers vector is updated for all primes, the remaining numbers vector is the last prime factors to be multiplied with the vector for collecting distinct prime factors (radicals) to get the resulting radicals.
The performance is 109 milliseconds for this last version:
> microbenchmark(ordered.radicals.1(), times = 10) Unit: milliseconds expr min lq mean median uq max ordered.radicals.1() 106.1128 107.258 110.9804 109.0543 110.4271 129.5838 neval 10
We sped up the code nearly 30 times as compared with the second version and nearly 120 times as compared with the first “raw” version. Though we are still in the realm of “brute forcing”, by incorporating a little number theory and a better way to check divisibility, our force is now much more elegant and intelligent and manipulates the low level mechanics of the higher level R language better.
Note: The code snippets given are just critical excerpts from the original code given in order to illustrate the progression in algorithm enhancement.