Painting a picture of statistical packages

[This article was first published on eKonometrics, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Imagine you have to analyze text comprising 18,000 words. You have to identify the most commonly cited ideas or words in the text and then present the analysis in a graphic format. There are sophisticated tools out there to help you with this task, but then again there is a tight deadline. You have fewer than five minutes to accomplish the task.

Generating a word cloud from the text may be one option. It is fast and the resulting output is appealing as well as informative. See the word cloud below, which I have generated from the description of 2,948 R packages listed at http://cran.r-project.org/web/packages. The one-liner description of these packages ran into 18,000-plus words. By using the free word cloud tool Wordle (http://www.wordle.net/), the task was accomplished in less than two minutes.

image

Based on the cloud we can see that the most frequent recurring themes in R packages are data, functions, models, estimation, regression, and Bayesian.

Wordle offers some control over the output. Consider the above cloud that was generated using the most common 150 words in the text. I eliminated ‘Analysis’ from the text since it was the most frequently repeated text. Later, I restricted the cloud to 100 most repeated words and removed restriction on  the word ‘Analysis’, and a randomly generated a word cloud. See the output below.

image

Notice the two variants of the word ‘data’ in the cloud. Wordle allows the user to eliminate any word in the generated cloud with a click of a mouse and retain the cleaned version of the cloud.

Also, don’t miss Drew Conway’s blog on building a more intelligent word clouds at http://www.drewconway.com/zia/?p=2624.

To leave a comment for the author, please follow the link and comment on their blog: eKonometrics.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)