Painting a picture of statistical packages

April 4, 2011

(This article was first published on eKonometrics, and kindly contributed to R-bloggers)

Imagine you have to analyze text comprising 18,000 words. You have to identify the most commonly cited ideas or words in the text and then present the analysis in a graphic format. There are sophisticated tools out there to help you with this task, but then again there is a tight deadline. You have fewer than five minutes to accomplish the task.

Generating a word cloud from the text may be one option. It is fast and the resulting output is appealing as well as informative. See the word cloud below, which I have generated from the description of 2,948 R packages listed at The one-liner description of these packages ran into 18,000-plus words. By using the free word cloud tool Wordle (, the task was accomplished in less than two minutes.


Based on the cloud we can see that the most frequent recurring themes in R packages are data, functions, models, estimation, regression, and Bayesian.

Wordle offers some control over the output. Consider the above cloud that was generated using the most common 150 words in the text. I eliminated ‘Analysis’ from the text since it was the most frequently repeated text. Later, I restricted the cloud to 100 most repeated words and removed restriction on  the word ‘Analysis’, and a randomly generated a word cloud. See the output below.


Notice the two variants of the word ‘data’ in the cloud. Wordle allows the user to eliminate any word in the generated cloud with a click of a mouse and retain the cleaned version of the cloud.

Also, don’t miss Drew Conway’s blog on building a more intelligent word clouds at

To leave a comment for the author, please follow the link and comment on their blog: eKonometrics. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)