# Random sudokus

May 16, 2010
By

(This article was first published on Xi'an's Og » R, and kindly contributed to R-bloggers)

After thinking about random sudokus for a few more weeks, I eventually came to read the paper by Newton and DeSalvo about the entropy of sudoku matrices. As written earlier, if we consider (as Newton and DeSakvo) a uniform distribution where the sudokus are drawn uniformly over the set of all sudokus, the entropy of this distribution is $log(N)$, where N is the size of this set of all sudokus. The (Shannon) entropy considered in the paper is completely different: it is

$H = -sum_{i=1}^9 hat sigma_i log(hat sigma_i)$

where the $hat sigma_i$‘s are the normalised transforms of the singular values of the sudoku matrix. The result that the average H is about 1.733, being larger than the same average over a collection of purely random matrices, 1.651, is then much less spectacular…

Another entry in the same paper is worth pondering about: the authors consider a random selection mechanism, working one entry at a time from the first row downwards, with a backward step in case of impossible entries. An R rendering of this generator is

s=matrix(0,ncol=9,nrow=9)
pool=rep(TRUE,9)
s[1,]=sample(1:9,9)
i=2;
while (i<10){
boxa=3*trunc((i-1)/3)+1;boxa=boxa:(boxa+2)
for (j in 1:9){
boxb=3*trunc((j-1)/3)+1;boxb=boxb:(boxb+2)
for (u in (1:9))
pool[u]=(sum(u==s[i,])+sum(u==s[,j]) +sum(u==s[boxa,boxb]))==0
if (sum(pool)>1){
s[i,j]=sample((1:9)[pool],1)
}else{
if (sum(pool)==1)
s[i,j]=(1:9)[pool]
if (sum(pool)==0){
rmrk=sample(0:(i-1),1,prob=1/(1:i)^9)
s[(i-rmrk):i, ]=0
i=i-rmrk-1
break()

}}}
i=i+1
}

and it produces valid sudokus! The question is rather to test how uniform this generator is (the authors replace uniformity with unbiasedness, resulting in a simplistic necessary condition on the generator!) and the paper is only considering an answer based on the average of the s[i,j]‘s, which are naturally equidistributed over 1,2,…,9! While this answer is unsatisfactory, I have no clear idea on how to attack the problem. For instance, I ran an Rperiment looking for the sequence 1,2,…,9 to appear in either a row or a column: I think the probability of occurrence of this sequence is 1/8! (since there is always one row and one column starting with 1), i.e. 2.5 10-5, while 36000 simulations from the above showed a frequency of 5.5 10-5 both for rows and columns (i.e., 2 occurences over the 36000 replicas!). This only shows the need for a much larger experiment (and presumably a move from R to C  if I want the answer quickly!). Another test that is permutation invariant would be to check for the distribution of the number of different digits on the diagonal of the sudoku matrix, but I am not certain about the theoretical distribution. For instance, running 10,000 simulations, the average number of different digits is 6.3, with no occurrence of 1, 2 or 3, and the binomial fit is poor.

Filed under: R, Statistics Tagged: entropy, Kullback, Monte Carlo, simulation, sudoku, uniformity

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...