Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Over the Christmas holidays, I read “Maths Meets Myths: Quantitative Approaches to Ancient Narratives,” from the Springer Understanding Complex Systems collection.

The authors present their application of “hard” science techniques to datasets coming from the humanities — mostly large corpus of texts, legends and myths.

One paper in particular uses bioinformatics and phylogenetics to study the spread of a popular folk tale: Little Red Riding Hood. The story that I knew from Perrult and Grimm has patterns that are also found in African and East Asian tales.

### The Tractatus Logico-Philosophicus viewed as a phylogenetic tree

Inspired by this, I've had a look at Wittgenstein's Tractatus Logico Philosophicus (available on Project Gutenberg), which is presented as hierachically numbered statements and sub-statements.

We start by scraping the book into a dataframe with one row per statement:

library(rvest)
root <- page %>% html_node("#root")

df <- data.frame()
for (item in root %>% html_nodes('li')) {
label <- item %>% html_attr("data-name")
content <- item %>% html_text(trim = TRUE)

temp <- data.frame(label, content)
df <- rbind(df, temp)
}


We then generate our cluster analysis based on the distance between the columns of df, hoping that the hierachical numbering of statements will yield something interesting.

We adopt the single method, described like so:

The single linkage method (which is closely related to the minimal spanning tree) adopts a ‘friends of friends’ clustering strategy

clusters <- hclust(dist(df), method = "single")


### Dendograms galore

From these clusters, we can represent the book as dendograms, which are used in phylogenetics to represent evolutionary splits and genetic relationships in a tree.

plot(clusters, labels = clusters\$labels)


d <- as.dendrogram(clusters)
plot(d, horiz = TRUE, type = "triangle")


library(ape)
plot(as.phylo(clusters), type = "fan")


The diagrams above show how our clusters have correctly grouped together the hierachical statements of the Tractatus.

From Mike Bostock's Tree of Life helped by Jason Davies' work parsing a Newick text file format (standard in tree representations) in Javascript, I re-implemented the above with d3-jetpack and ES6: https://bl.ocks.org/basilesimon/66db4338c15099f6e8d62f236db2ef2d.

I love how simple the result looks and how little we end up knowing about the book itself. The only thinkg I'll let you in the final, chapter seven put-down of this book about language, facts and truths of the world:

What we cannot speak about we must pass over in silence.

Precisely what I didn't do in this blog about phylogenetics and a book I never finished.