A plea for less word clouds

April 25, 2013

(This article was first published on me nugget, and kindly contributed to R-bloggers)

Word cloud of DOMA hearing transcripts

I must admit, there is something appealing about the word cloud – that is, until you try to understand what it actually means…

Word clouds are pervasive – even in the science world. I was somewhat spurred to write this given the incredibly wasteful summaries of EGU General Assembly survey results that include several useless word clouds (link to document). Capitalization of words isn’t even considered; e.g. “Nice” vs.”nice”. I have been hesitant to equate word clouds to the hilariously labeled “mullets of the internet” but, on second thought, it is entirely appropriate. They were once fad, but seem reluctant to die…

Oh, and yes, a “tag cloud” is a type of word cloud – I have fallen into the trap myself by including such a thing on this blog! I honestly didn’t make the connection at first, because, at least, it had the function of showing the relative importance of terms that I personally defined as topics – not an arbitrary puking up of all the words that I have ever written here. Nevertheless, I think it must be removed now – I can’t tell you how many times that I have wanted to go to a specific blog post by clicking on a tag, only to be forced to search into the nether regions of (extremely) small font size. Simple alphabetical arrangement probably makes more sense.

There are some attempts at making word clouds with R (most notable the “wordcloud” package), but they don’t seem to be as visually appealing as those easily produced by sites such as Wordle. Nevertheless, you continue to see such things produced – just do a search for “word cloud” on R-bloggers for many examples.

I decided to give Wordle a try, and chose the Defence of Marriage Act (DOMA) hearing transcripts as a source for text. The above word cloud shows the results (with some beautiful patriotic colonial-looking font to boot!). It doesn’t reveal much to me. An initial attempt caught me off-guard in that the dominant word was “justice” (below), which would have possibly been insightful if it hadn’t been a construct of the prevalence of the speakers titles (i.e. “Justice Kagan”):

An even more worthless word cloud of DOMA hearing transcripts

Anyway, I’m glad I’m not alone in this thinking – I have come across many discussions along the same lines; in particular, the nice article Jacob Harris. Unfortunately, it seems they are here to stay, and I will just have to learn to better avert my eyes from their alluring power in the future…

To leave a comment for the author, please follow the link and comment on their blog: me nugget.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.


Mango solutions

RStudio homepage

Zero Inflated Models and Generalized Linear Mixed Models with R

Quantide: statistical consulting and training



CRC R books series

Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)