**Xi'an's Og » R**, and kindly contributed to R-bloggers)

**R**obin Ryder pointed out to me that 3 is indeed the absolute minimum one could observe because of the block constraint (*bon sang, mais c’est bien sûr !*). The distribution of the series of 3 digits being independent over blocks, the theoretical distribution under uniformity can easily be simulated:

#uniform distribution on the block diagonal

sheik=rep(0,9)

for (t in 1:10^6){

group=length(unique(c(sample(1:9,3),sample(1:9,3),sample(1:9,3))))

sheik[group]=sheik[group]+1

}

and it produces a result that is close enough to the one observed with the random sudoku generator. Actually, the exact distribution is available as *(corrected on May 19!)*

pdiag=c(1, #k=3

(3*6+3*6*4), #k=4

(3*choose(6,2)+3*6*5*choose(4,2)+3*choose(5,3)*choose(6,2)), #k=5

(choose(6,3)+3*6*choose(5,2)*4+3*choose(6,2)*choose(5,2)*4+

choose(6,3)*choose(6,3)),#k=6

(choose(3,2)*6*choose(5,3)+3*choose(6,2)*choose(4,2)*5+

choose(6,3)*choose(6,2)*3), #k=7

(3*choose(6,2)*4+choose(6,3)*6*choose(3,2)), #k=8

choose(6,3))/choose(9,3)^2 #k=9

choose(9,6))/choose(9,3)^2 #k=9

hence a better qq-plot:

Filed under: R, Statistics Tagged: combinatorics, entropy, Kullback, Monte Carlo, simulation, sudoku, uniformity

**leave a comment**for the author, please follow the link and comment on his blog:

**Xi'an's Og » R**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...