**Xi'an's Og » R**, and kindly contributed to R-bloggers)

**D**arren Wraith pointed out this column about sudokus to me. It analyses the paper by Newton and De Salvo published in the * Proceedings of the Royal Academy of Sciences A* that I cannot access from home. The discussion contains this absurd sentence “

*Sudoku matrices are actually more random than randomly-generated matrices*” which shows how mistreated the word “random” can be! The point is absurd because any random sequence is random, no more, no less than others. It is only the distribution behind the randomness that changes. I presume the author of the column interpreted a large entropy value as a sign of higher randomness and I would guess the authors did not make this claim. Although we are talking about distributions over a finite set, the uniform distribution over the set of all 1,2,…,9 valued matrices has simply nothing to do with the uniform distribution of the set of all sudoku matrices. Up to a factor , the later set is negligible within the first one. (It is impossible to create a sudoku by pulling 81 integers at random.) Therefore, comparing the entropies of both distributions hardly makes sense… This other remark in the column

“Newton and DeSalvo predict that this greater understanding of Sudoku could lead to better Sudoku-generating algorithms that create more difficult puzzles. Currently, Sudoku puzzles require at least 17 numbers to be given in their correct boxes in order for the puzzle solver to find a unique solution. The new study could decrease that number, making it more difficult to solve the puzzles.”

does not seem more relevant. Why should the randomness of sudokus be related to their difficulty or to the minimal number of entries for a unique solution?

Filed under: R, Statistics Tagged: entropy, Monte Carlo, simulated annealing, simulation, sudoku

**leave a comment**for the author, please follow the link and comment on his blog:

**Xi'an's Og » R**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...