# The Eye of the World as word cloud

December 16, 2012
By

(This article was first published on Wiekvoet, and kindly contributed to R-bloggers)

The Eye of the World is the first book of Robert Jordan's Wheel of Time books. As the last of these books will be published soon, I was wondering if natural language processing can be used to examine books like these. For this purpose I downloaded a copy from somewhere undisclosed and analyzed it.

During my experiments with this file I found wordcloud was actually a good way to look at this. My first attempts, using correspondence analysis did not give anything useful. Everything on top of each other does not yield an interesting plot. Clustering of chapters did not reveal anything nice. Wordcloud has comparison clouds, which can be used to differentiate between chapters.
I am sure readers can do their own interpretation of this. Myself, I am surprised by the massive amount of names of places and persons in this first book, even though I know the number of persons in the series is large.

R code
r1 <- readLines("Robert Jordan - Wheel Of Time 01 - The Eye Of The World.txt")
#remove text page xxx
pagina <- grep('^Page [[:digit:]]+$',r1) r1 <- r1[-pagina] r1 <- sub('Page [[:digit:]]+$','',r1)
# remove empty lines
r1 <- r1[r1!='']
chapterrow <- grep('^(CHAPTER [[:digit:]]+)|(PROLOGUE)$',r1) chapterrow <- c(chapterrow,length(r1)+1) #extract chapters chapters <- sapply(1:(length(chapterrow)-1),function(i) paste(r1[(chapterrow[i]+2):(chapterrow[i+1]-1)],sep=' ')) chapterrow <- chapterrow[-length(chapterrow)] #name the chapters chapternames <- paste(sub('CHAPTER ','',r1[chapterrow]),r1[chapterrow+1]) names(chapters) <- chapternames # use example processing from tm library(tm) EotW <- Corpus(VectorSource(chapters)) EotW <- tm_map(EotW,stripWhitespace) EotW <- tm_map(EotW,tolower) EotW <- tm_map(EotW,removeWords,stopwords("English")) EotW <- tm_map(EotW,stemDocument) EotW <- tm_map(EotW,removePunctuation) library(wordcloud) tdmEotW <- TermDocumentMatrix(EotW) h1 <- hclust(dist(t(sqrt(as.matrix(tdmEotW )))),method='ward') # hclust to put related chapters together # and make a cloud library(colorspace) tdmEotW2 <- as.matrix(tdmEotW)[,h1$order]

comparison.cloud(tdmEotW2,random.order=FALSE,scale=c(1.4,.6),title.size=.7,
colors=rainbow_hcl(n=57))