Solving easy problems the hard way

March 17, 2012
By

(This article was first published on Cerebral Mastication » R, and kindly contributed to R-bloggers)

There’s a charming little brain teaser that’s going around the Interwebs. It’s got various forms, but they all look something like this:

This problem can be solved by pre-school children in 5-10 minutes, by programer – in 1 hour, by people with higher education … well, check it yourself! :)

8809=6
7111=0
2172=0
6666=4
1111=0
3213=0
7662=2
9313=1
0000=4
2222=0
3333=0
5555=0
8193=3
8096=5
7777=0
9999=4
7756=1
6855=3
9881=5
5531=0
2581=?

SPOILER ALERT…

The answer has to do with how many circles are in each number. So the number 8 has two circles in its shape so it counts as two. And 0 is one big circle, so it counts as 1. So 2581=2. Ok, that’s cute, it’s an alternative mapping of values with implied addition.

What bugged me was how might I solve this if the mapping of values was not based on shape. So how could I program a computer to solve this puzzle? I gave it a little thought and since I like to pretend I’m an econometrician, this looked a LOT like a series of equations that could be solved with an OLS regression. So how can I refactor the problem and data into a trivial OLS? I really need to convert each row of the training data into a frequency of occurrence chart. So instead of 8809=6 I need to refactor that into something like:

1,0,0,0,0,0,0,0,2,1 = 6

In this format the independent variables are the digits 0-9 and their value is the number of times they occur in each row of the training data. I couldn’t figure out how to do the freq table so, as is my custom, I created a concise simplification of the problem and put it on StackOverflow.com which  yielded a great solution. Once I had the frequency table built, it was simple a matter of a linear regression with 10 independent variables and a dependent with no intercept term.

My whole script, which you should be able to cut and paste into R, if you are so inclined, is the following:

## read in the training data
## more lines than it should be because of the https requirement in Github
temporaryFile <- tempfile()
download.file("https://raw.github.com/gist/2061284/44a4dc9b304249e7ab3add86bc245b6be64d2cdd/problem.csv",destfile=temporaryFile, method="curl")
series <- read.csv(temporaryFile)

## munge the data to create a frequency table
freqTable <- as.data.frame( t(apply(series[,1:4], 1, function(X) table(c(X, 0:9))-1)) )
names(freqTable) <- c("zero","one","two","three","four","five","six","seven","eight","nine")
freqTable$dep <- series[,5]

## now a simple OLS regression with no intercept
myModel <- lm(dep ~ 0 + zero + one + two + three + four + five + six + seven + eight + nine, data=freqTable)
round(myModel$coefficients)

Created by Pretty R at inside-R.org

The final result looks like this:

> round(myModel$coefficients)
zero   one   two three  four  five   six seven eight  nine
   1     0     0     0    NA     0     1     0     2     1
So we can see that zero, six, and nine all get mapped to 1 and eight gets mapped to 2. Everything else is zero. And four is NA because there were no fours in the training data.
There. I’m as smart as a preschooler. And I have code to prove it.

To leave a comment for the author, please follow the link and comment on his blog: Cerebral Mastication » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.