# Using Occam’s Razor to solve genius math puzzles

March 4, 2016
By

(This article was first published on R – Cartesian Faith, and kindly contributed to R-bloggers)

Math puzzles always tickle the brain, and this one has tickled quite a few on LinkedIn. Why are these puzzles so popular, and what’s the right answer? I sampled 610 responses to find out.

Of the 610 responses I sampled, the range of answers was surprisingly large, although there were two clear candidates 98 and 99, followed by a less likely third, 101. The full table of counts looks like

```x
2  27  40  71  81  82  88  97  98  99 100 101 107 108 113 119 263
1   1   1   1   1   1   1   4 268 293   1  17   1   4   1  10   4
```

According to the data, 48.1% of respondents said the answer is 99, while 43.9% said it was 98. If we trust the wisdom of the crowds, 99 is the correct answer, yes? Not so fast. This is what makes puzzles like this so popular because multiple plausible answers seem to exist. If there was a clear majority for a particular answer, the puzzle wouldn’t stoke the embers of our emotions. (Other puzzle constructions exist that use different strategies to draw people in.)

Why is 99 incorrect? People used a few different approaches to arrive at this answer. Let’s define a matrix $A$ that represents the table, with columns $x$, $y$, $z$. One common solution is to multiply the elements in each row $A_{i,x} * A_{i,y}$ and notice that the answer is that product less the $x$ value in the previous row, $A_{i-1,x}$. This yields $[ 23, 47, 79, 99 ]$, so the answer is 99. What’s wrong with this approach? There are two main problems that all quantitative people should be wary of. First is the dependency across rows, which relies on poor assumptions. Notice that for this approach to work, the first row needs an initial value of 1. This fits the sequence of the $x$ column, so why not? What about the other columns in this initial row? According to the pattern, $y_0 = 0$, which presents problems for $z_0$. Furthermore, when we get to the last row, the entries don’t follow the sequence, which invalidates the model. And yet, more people chose this answer than any other! I suspect that it is due to Daniel Kahneman’s System 1 rearing its ugly head.

What about 98 then? Is this the correct answer? I saw two main approaches taken to arrive at this solution. The first defines the relationship $z = x*y + y - 1$. This results in the sequence $z = [ 7, 23, 47, 79, 98 ]$. What’s nice about this solution is that there are no row dependencies, so there is no assumption about initial values nor issues with the pattern of the sequence changing. However, there’s an even more concise solution, which is simply $z = x^2 - 2$, which also results in $[ 7, 23, 47, 79, 98 ]$. Why is this a better solution? From a statistics perspective, we might say that we can explain the response variable with one instead of two variables. This is essentially Occam’s Razor. Anecdotally, this latter approach was less popular than the former. Again, I suspect that this is a consequence of System 1 attempting to fit rules to data sub-optimally. By this I mean that since two columns of data are provided, our brains look for a solution that uses both variables. Once this is satisfied, we essentially stop looking, even if there is a better solution.

So the lesson is that model builders and quantitative folks need to be ever vigilant of our own biases. Even if we come up with a solution that seems to fit the data, we need to think critically about the assumptions we are making and whether better, simpler models are out there.

Brian Lee Yung Rowe is Founder and Chief Pez Head of Pez.AI // Zato Novo, a conversational AI platform for guided data analysis and automated customer service. Learn more at Pez.AI.

### R Code

```x <- read.table('linkedin_math.txt')[,1]
table(x)

a <- matrix(c(3,2,7, 5,4,23, 7,6,47, 9,8,79, 10,9,NA), ncol=3, byrow=TRUE)
colnames(a) <- c('x','y','z')

# Yields 99 as final answer
sapply(2:nrow(a), function(i) a[i,1] * a[i,2] + a[i-1,1])

# Yields 98 as final answer
sapply(1:nrow(a), function(i) a[i,1] * a[i,2] + a[i,2] - 1)

# Yields 98 as final answer
sapply(1:nrow(a), function(i) a[i,1]^2 - 2)
```

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...