[This article was first published on The Devil is in the Data, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Euler Problem 14 looks at the Collatz Conjecture. These playful sequences, named after German mathematician Lothar Collatz (1910–1990), cause mathematicians a lot of headaches. This video introduces the problem much better than I can describe it.

## Euler Problem 14 Definition

The following iterative sequence is defined for the set of positive integers: $n \rightarrow n/2$ ( $n$ is even) $n \rightarrow 3n + 1$ ( $n$ is odd)

Using the rule above and starting with 13, we generate the following sequence: $13 \rightarrow 40 \rightarrow 20 \rightarrow 10 \rightarrow 5 \rightarrow 16 \rightarrow 8 \rightarrow 4 \rightarrow 2 \rightarrow 1$

This sequence (starting at 13 and finishing at 1) contains 10 terms. Although it has not been proved yet (Collatz Problem), it is thought that all starting numbers finish at 1. Which starting number, under one million, produces the longest chain? Note: Once the chain starts the terms are allowed to go above one million.

## Solution

This problem is highly computationally intensive and it highlights R’s lack of speed. Generating one million Collatz sequences and finding the longest one requires a lot more than a minute of processing time allowed for in Project Euler.

collatz.chain <- function(n) {
chain <- vector()
i <- 1
while (n! = 1) {
if (n%%2 == 0)
n <- n / 2
else
n <- 3 * n + 1
chain[i] <- n
i <- i + 1
}
return(chain)
}
collatz.max <- 0
for (n in 1:1E6) {
collatz.length <- length(collatz.chain(n))
if (collatz.length > collatz.max) {
collatz.max <- collatz.length
}
}


The second version of the code contains some optimisations. The code stores the length of all sequences in an array. When the code generates a sequence and lands on a number already analysed, then it adds that previous number to the current one and moves on. This approach requires more memory but saves a lot of computation time. A minor tweak to the code optimises the rule for uneven numbers. Tripling an uneven number and adding one will always result in an even number so we can skip one step. This solution is more than twice as fast as the first version.

collatz.length <- vector(length=1e6)
collatz.length <- 0
for (n in 2:1e6) {
x <- n
count <- 0 while (x != 1 & x >= n) {
if (x %% 2 == 0) {
x <- x / 2
count <- count + 1
}
else {
x <- (3 * x + 1) / 2
count <- count + 2
}
}
count <- count + collatz.length[x]
collatz.length[n] <- count
}


## Visualising Collatz Sequences

The Collatz sequence is an example of a simple mathematical rule that can create an unpredictable pattern. The number of steps required to reach 1 is listed in A006577 of the Online Encyclopedia of Integer Sequences.

The image below visualises the number of steps for the first 1000 positive numbers. The scatterplot shows some interesting patterns. Does this visualisation show that the Collatz Sequence does have a pattern after all? ## Collatz Chains

The Collatz sequences can also be visualised using networks. Each step between two numbers is an edge and the numbers are the vertices. For example, the network for the Collatz sequence for number 10 is 5–16, 16–8, 8–4, 4–2, 2–1. When generating subsequent sequences the network will start to overlap and a tree of sequences appears. The tree below combines the Collatz sequences for the numbers 2 to 26. Number 27 has a very long sequence, making the tree much harder to read.

edgelist <- data.frame(a = 2, b = 1)
for (n in 3:26) {
chain <- as.character(c(n, collatz.chain(n)))
chain <- data.frame(a = chain[-length(chain)], b = chain[-1])
edgelist <- rbind(edgelist, chain)
}
library(igraph)
g <- graph.edgelist(as.matrix(edgelist))
g <- simplify(g)
par(mar=rep(0,4))
V(g)$color <- degree(g, mode = "out") + 1 plot(g, layout=layout.kamada.kawai, vertex.color=V(g)$color,
vertex.size=6,
vertex.label.cex=.7,
vertex.label.color="black",
edge.arrow.size=.1,
edge.color="black"
)


The post Euler Problem 14: Longest Collatz Sequence appeared first on The Devil is in the Data.