In general, TOEFL(Test of English as a Foreign Language) is not an easy test for Chinese students, including me. Relatively speaking, the reading section is little easier than the other sections (listening, speaking, writing). Interestingly, when I prepared my TOEFL test, I found that some important words appeared frequently in the mock examination. So I did a simple experiment this night just out of my curiosity. First I picked some relevant materials from Internet (Google covered). And then I did some basic transformations such as converting to plain text documents, eliminating extra whitespace, converting to lower case, remove stopwords and so on. Actually it can be completed easily in R, just based on package tm. Obviously tm is an excellent and significant package in text manipulation. After this step, package wordcloud enable us to plot a word cloud effortlessly. The result is as follows,
And the main codes are shown bellow,
library(tm); library(wordcloud); txt<-"E:\\TOEFL"; b<-Corpus(DirSource(txt),readerControl=list(language="eng")); b<-tm_map(b,stripWhitespace); b<-tm_map(b,removePunctuation); b<-tm_map(b,tolower); b<-tm_map(b,removeWords,c("and","the")); b<-tm_map(b,removeWords,c("may","can")); b<-tm_map(b,removeWords,c("also","often","one")); b<-tm_map(b,removeWords,stopwords("english")); tdm<-TermDocumentMatrix(b); m1<-as.matrix(tdm); v1<-sort(rowSums(m1),decreasing=TRUE); d1<-data.frame(word =names(v1),freq=v1); par(bg="lightyellow"); set.seed(10); wordcloud(d1$word, d1$freq, scale=c(4,0.8), min.freq=6,max.words=100, col=rainbow(length(d1$freq)),font=2);
By the way, this article is just for fun. Please do not consult this when you prepare you test. Actually the result is also not satisfied, because I did not finish some advanced process, such as tense, singular&plural. Finally, hope all of the students who are dying to study abroad gets a satisfied score in TOEFL.