Word Clouds in R

Posted on September 13, 2012 by Mollie in Uncategorized | 0 Comments

[This article was first published on Mollie's Research Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Thanks to the wordcloud package, it’s super easy to make a word cloud or tag cloud in R.

In this case, the words have been counted already. If you are starting with plain text, you can use the text mining package tm to obtain the counts. Other bloggers have provided good examples of this. I’ll just be covering the simple case where we already have the frequencies.

Let’s look at some commonly used words during the National Conventions this year. The New York Times produced a cool infographic that we’ll use as our data source. The data in csv format (and the R code too) are available in a gist.

First we need to load up the packages and our data:

library(wordcloud)
library(RColorBrewer)

conventions <- read.table("conventions.csv",
 header = TRUE,
 sep = ",")

And then we can get to using the wordcloud library to produce our clouds in R:

png("dnc.png")
wordcloud(conventions$wordper25k, # words
 conventions$democrats, # frequencies
 scale = c(4,1), # size of largest and smallest words
 colors = brewer.pal(9,"Blues"), # number of colors, palette
 rot.per = 0) # proportion of words to rotate 90 degrees
dev.off()

png("rnc.png")
wordcloud(conventions$wordper25k,
 conventions$republicans,
 scale = c(4,1),
 colors = brewer.pal(9,"Reds"),
 rot.per = 0)
dev.off()

DNC word cloud

RNC word cloud

The default word cloud has some words rotated 90 degrees, but I prefer to use rot.per = 0 to make them all horizontal for readability.

You can easily change to just one color if you prefer that since the size already denotes the frequency of the word, by changing color to "red3", for example:

RNC single color

png("rncalt.png")
wordcloud(conventions$wordper25k,
 conventions$republicans,
 scale = c(4,1),
 colors = "red3",
 rot.per = 0)
dev.off()

DNC single color

And there you have it, a simple way to generate a word count from frequency data using R.

To leave a comment for the author, please follow the link and comment on their blog: Mollie's Research Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Word Clouds in R

Related

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)