Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

It’s Monday today! It’s work day! And I’ve already worked on computer for two hours. Time for a break, which is the 14th problem of Project Euler.

The following iterative sequence is defined for the set of positive integers: n n/2 (n is even); n 3n + 1 (n is odd)
Although it has not been proved yet (Collatz Problem), it is thought that all starting numbers finish at 1. Which starting number, under one million, produces the longest chain?

On the first glance of this problem, the first thought in my mind is that I would try to solve this without a brute force solution, which is very straightforward. It looks there is some underlying pattern between the starting number and the length of the chain (see the following figure and the wikipedia). According the problem, we have the function: F(n-1) = F(n)/2 if F(n) is even; 3*F(n)+1 if F(n) is odd. My intuitive idea is to get the function backwards, i.e. what is F(n+1) if I know F(n)? Then I get this: F(n+1) = (F(n)-1)/3 (when F(n) %% 2 == 0 & the resulting F(n+1) is a natural number) or 2*F(n) (no limit on F(n)). As long as all numbers would finish at 1, I should be able to generate all natural numbers using the latter function. So,

• F(1) = 1
• F(2) = 2, (or non-valid F(2) = (1-1)/3 = 0)
• F(3) = 4, (or non-valid F(3) = (2-1)/3 = 1/3)
• F(4) = 8, (or non-valid F(4) = (4-1)/3 = 1)
• F(5) = 16, (or non-valid F(5) = (8-1)/3 = 7/3)
• F(6) = 32, or valid F(6) = (16-1)/3 = 5
• F(7) = 64 (or non-valid 31/3), F(7) = 10 (or non-valid 4/3)
• F(8) = 128, F(8) = 21 and F(8) = 20, F(8) = 3 …

Although from the figure there seems some pattern, it turns out too complicated. I can’t work it out. Indeed, neither do the mathematicians as described in Wikipedia. This is still a so-called Collatz conjecture. I’d better stop now. The code is very simple although it will take a long time to perform. Two optimizations could be adopted: one is to save the intermediate results into the memory, such as a harsh table, which could save a lot of time; the other is to only calculate the bigger numbers as the bigger numbers tends to have longer chains. The latter one could lead to some sub-optimal results.

?View Code RSPLUS

 ```1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 ``` ```collatz <- function(x) { count <- 0 while (x > 1) { if (x %% 2 == 0) { x <- x / 2 count <- count + 1 } else { x <- 3 * x + 1 count <- count + 1 } } return(count) } chain.len <- numeric(1e6) for (i in 1:(1e6)) { len.terms[i] <- collatz(i) } result <- which.max(chain.len) cat("The result is:", result, "\n")   # for the figure presented here steps <- numeric(1:1000) starting.number <- 1:1000 for (i in starting.number) { steps[i] <- collatz(i) } png(width = 600, height = 600) plot(starting.number, steps, pch = 16, main = "Collatz sequence with starting numbers no more than 1000") dev.off()```