**R**obin Ryder pointed out to me that 3 is indeed the absolute minimum one could observe because of the block constraint (*bon sang, mais c’est bien sûr !*). The distribution of the series of 3 digits being independent over blocks, the theoretical distribution under uniformity can easily be simulated:

#uniform distribution on the block diagonal

sheik=rep(0,9)

for (t in 1:10^6){

group=length(unique(c(sample(1:9,3),sample(1:9,3),sample(1:9,3))))

sheik[group]=sheik[group]+1

}

and it produces a result that is close enough to the one observed with the random sudoku generator. Actually, the exact distribution is available as *(corrected on May 19!)*

pdiag=c(1, #k=3

(3*6+3*6*4), #k=4

(3*choose(6,2)+3*6*5*choose(4,2)+3*choose(5,3)*choose(6,2)), #k=5

(choose(6,3)+3*6*choose(5,2)*4+3*choose(6,2)*choose(5,2)*4+

choose(6,3)*choose(6,3)),#k=6

(choose(3,2)*6*choose(5,3)+3*choose(6,2)*choose(4,2)*5+

choose(6,3)*choose(6,2)*3), #k=7

(3*choose(6,2)*4+choose(6,3)*6*choose(3,2)), #k=8

choose(6,3))/choose(9,3)^2 #k=9

choose(9,6))/choose(9,3)^2 #k=9

hence a better qq-plot:

Filed under: R, Statistics Tagged: combinatorics, entropy, Kullback, Monte Carlo, simulation, sudoku, uniformity

*Related*

To

**leave a comment** for the author, please follow the link and comment on their blog:

** Xi'an's Og » R**.

R-bloggers.com offers

**daily e-mail updates** about

R news and

tutorials on topics such as:

Data science,

Big Data, R jobs, visualization (

ggplot2,

Boxplots,

maps,

animation), programming (

RStudio,

Sweave,

LaTeX,

SQL,

Eclipse,

git,

hadoop,

Web Scraping) statistics (

regression,

PCA,

time series,

trading) and more...

If you got this far, why not

__subscribe for updates__ from the site? Choose your flavor:

e-mail,

twitter,

RSS, or

facebook...

**Tags:** combinatorics, entropy, Kullback, Monte Carlo, R, Simulation, statistics, sudoku, uniformity