Word Clouds in R

September 13, 2012

(This article was first published on Mollie's Research Blog, and kindly contributed to R-bloggers)

Thanks to the wordcloud package, it’s super easy to make a word cloud or tag cloud in R.

In this case, the words have been counted already. If you are starting with plain text, you can use the text mining package tm to obtain the counts. Other bloggers have provided good examples of this. I’ll just be covering the simple case where we already have the frequencies.

Let’s look at some commonly used words during the National Conventions this year. The New York Times produced a cool infographic that we’ll use as our data source. The data in csv format (and the R code too) are available in a gist.

First we need to load up the packages and our data:


conventions <- read.table("conventions.csv",
header = TRUE,
sep = ",")

And then we can get to using the wordcloud library to produce our clouds in R:

wordcloud(conventions$wordper25k, # words
conventions$democrats, # frequencies
scale = c(4,1), # size of largest and smallest words
colors = brewer.pal(9,"Blues"), # number of colors, palette
rot.per = 0) # proportion of words to rotate 90 degrees

scale = c(4,1),
colors = brewer.pal(9,"Reds"),
rot.per = 0)

DNC word cloud

RNC word cloud

The default word cloud has some words rotated 90 degrees, but I prefer to use rot.per = 0 to make them all horizontal for readability.

You can easily change to just one color if you prefer that since the size already denotes the frequency of the word, by changing color to “red3”, for example:

RNC single color

scale = c(4,1),
colors = "red3",
rot.per = 0)

DNC single color

And there you have it, a simple way to generate a word count from frequency data using R.

To leave a comment for the author, please follow the link and comment on their blog: Mollie's Research Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)