There’s a charming little brain teaser that’s going around the Interwebs. It’s got various forms, but they all look something like this:
This problem can be solved by pre-school children in 5-10 minutes, by programer – in 1 hour, by people with higher education … well, check it yourself!
8809=6
7111=0
2172=0
6666=4
1111=0
3213=0
7662=2
9313=1
0000=4
2222=0
3333=0
5555=0
8193=3
8096=5
7777=0
9999=4
7756=1
6855=3
9881=5
5531=0
2581=?
SPOILER ALERT…
The answer has to do with how many circles are in each number. So the number 8 has two circles in its shape so it counts as two. And 0 is one big circle, so it counts as 1. So 2581=2. Ok, that’s cute, it’s an alternative mapping of values with implied addition.
What bugged me was how might I solve this if the mapping of values was not based on shape. So how could I program a computer to solve this puzzle? I gave it a little thought and since I like to pretend I’m an econometrician, this looked a LOT like a series of equations that could be solved with an OLS regression. So how can I refactor the problem and data into a trivial OLS? I really need to convert each row of the training data into a frequency of occurrence chart. So instead of 8809=6 I need to refactor that into something like:
1,0,0,0,0,0,0,0,2,1 = 6
In this format the independent variables are the digits 0-9 and their value is the number of times they occur in each row of the training data. I couldn’t figure out how to do the freq table so, as is my custom, I created a concise simplification of the problem and put it on StackOverflow.com which yielded a great solution. Once I had the frequency table built, it was simple a matter of a linear regression with 10 independent variables and a dependent with no intercept term.
My whole script, which you should be able to cut and paste into R, if you are so inclined, is the following:
## read in the training data ## more lines than it should be because of the https requirement in Github temporaryFile <- tempfile() download.file("https://raw.github.com/gist/2061284/44a4dc9b304249e7ab3add86bc245b6be64d2cdd/problem.csv",destfile=temporaryFile, method="curl") series <- read.csv(temporaryFile) ## munge the data to create a frequency table freqTable <- as.data.frame( t(apply(series[,1:4], 1, function(X) table(c(X, 0:9))-1)) ) names(freqTable) <- c("zero","one","two","three","four","five","six","seven","eight","nine") freqTable$dep <- series[,5] ## now a simple OLS regression with no intercept myModel <- lm(dep ~ 0 + zero + one + two + three + four + five + six + seven + eight + nine, data=freqTable) round(myModel$coefficients)
Created by Pretty R at inside-R.org
The final result looks like this:
> round(myModel$coefficients)zero one two three four five six seven eight nine1 0 0 0 NA 0 1 0 2 1
R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

Zero Inflated Models and Generalized Linear Mixed Models with R.
Zuur, Saveliev, Ieno (2012).