Illuminating the Illuminated Part Two: Ipsa Scientia Potestas Est

[This article was first published on Weird Data Science, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In the previous post in this series we coyly unveiled the tantalising mysteries of the Voynich Manuscript: an early 15th century text written in an unknown alphabet, filled with compelling illustrations of plants, humans, astronomical charts, and less easily-identifiable entities.

Stretching back into the murky history of the Voynich Manuscript, however, is the lurking suspicion that it is a fraud; either a modern fabrication or, perhaps, a hoax by a contemporary scribe.

One of the more well-known arguments for the authenticity of the manuscript, in addition to its manufacture with period parchment and inks, is that the text appears to follow certain statistical properties associated with human language, and which were unknown at the time of its creation.

The most well-known of these properties is that the frequency of words in the Voynich Manuscript have been claimed to follow a phenomenon known as Zipf’s Law, whereby the frequency of a word’s occurrence in the text is inversely proportional to its rank in the list of words ordered by frequency.

In this post, we will scrutinise the extent to which the expected statistical properties of natural languages hold for the arcane glyphs presented by the Voynich manuscript.

Unnatural Laws

Zipf’s Law is an example of a discrete power law probability distribution. Power laws have been found to lurk beneath a sinister variety of ostensibly natural phenomena, from the relative size of human settlements to the diversity of species descended from a particular ancestral freshwater fish.

In its original context of human langauge, Zipf’s Law states that the most common word in a given language is likely to be roughly twice as common as the second most common word, and three times as common as the third most common word. More precisely, this law holds for much of the corpus, as the law tends to break down somewhat at both the most-frequent and least-frequent words in the corpus

Twisting Paths

These analyses can only present a dim outline of the text itself, and we resist the awful temptation to attempt any form of decipherment. Certainly, the evidence here seems convincing enough that the Voynich Manuscript does represent a human language, but the statistics presented here are of little use in such an effort. It is likely, of course, that the most frequent words in the manuscript may, under certain assumptions, correspond to the most common words or particles in many languages — the definite article, the indefinite article, conjunctions, pronouns, and similar. Without deeper knowledge of the language, however, and with the range of scribing conventions and shortcuts commonplace in texts of the period, these techniques are too limited to do more than tantalise us with what we may never know.

Credible Conclusions

Subjecting the text of the Voynich Manuscript to the crude frequency analyses presented here can support, although not prove, the view that the manuscript, regardless of its true content, is not simply random gibberish. Nor is the text likely to be the result of a simple mechanical process designed without knowledge of the statistical patterns of human languages. Neither is it likely to be any form of cryptogram more sophisticated than the simplest ciphers, as these would have tended to compromise the statistical properties that we have observed.

The demonstrable following of Zipf’s Law, and the adherence to a Gamma distribution of similar shape to known languages, strongly suggests that the text is likely a representation of some natural language.

In the next post we will attempt blindly to wrench more secrets from the text itself through application of modern textual analysis techniques. Until then the Voynich Manuscript remains, silently obscure, beyond the reach of our faltering science.


To leave a comment for the author, please follow the link and comment on their blog: Weird Data Science. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)