Creating Tag Cloud Using R and Flash / JavaScript (SWFObject)

[This article was first published on Keep on Fighting! » R Language, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Tag cloud is a bunch of words drawn in a graph with their sizes proportional to their frequency; it’s widely used in blogs to visualize tags. We can observe important words quickly from a tag cloud, as they often appear in large fontsize. Tony N. Brown asked how to “graphically represent frequency of words in a speech” the other day in R-help list, which is actually a problem about the tag cloud:

I recently saw a graph on television that displayed selected words/phrases in a speech scaled in size according to their frequency. So words/phrases that were often used appeared large and words that were rarely used appeared small. […]

Marc Schwartz mentioned that Gorjanc Gregor has done some work years ago using R (in grid graphics). The obstacle of creating tag cloud in R, as Gorjanc wrote, lies in deciding the placement of words, and it would be much easier for other applications such as browsers to arrange the texts. That’s true — there have already been a lot of mature programs to deal with tag cloud. One of them is the wp-cumulus plugin for WordPress, which makes use of a Flash object to generate the tag cloud, and it has fantastic 3D rotation effect of the cloud.

1. Arranging text labels with pointLabel()

Before introducing how to port the plugin into R, I’d like to introduce an R function pointLabel() in maptools package and it can partially solve the problem of arranging text labels in a plot (using simulated annealing or genetic algorithm). Here is a simulated example:

Simulated Tag Cloud with R function pointLabel() in maptools

Simulated Tag Cloud with R function pointLabel() in maptools

x = runif(19)
y = runif(19)
w = c("R", "is", "free", "software", "and", "comes",
    "with", "ABSOLUTELY", "NO", "WARRANTY", "You", "are", "welcome",
    "to", "redistribute", "it", "under", "certain", "conditions")
par(ann = FALSE, xpd = NA, mar = rep(2, 4))
plot(x, y, type = "n", axes = FALSE)
pointLabel(x, y, w, cex = runif(19, 1, 5))

I was fortunate to get a very neat graph with no labels overlapping, but I don’t think this is a good solution, as it doesn’t take care of the initial locations of the words. My rough idea about deciding the initial locations is to sample on circles with radii proportional to the frequency, i.e. let x=textrm{freq}*sin(theta) and y=textrm{freq}*cos(theta) where thetasim U(0,2pi). In this case, important words will be placed near the center of the plot.

2. Creating tag cloud in a Flash movie using R

The problem becomes quite easy with a Flash movie tagcloud.swf and a JavaScript program swfobject.js. The mechanism, briefly speaking, is that the tag information is passed to the Flash object by JavaScript, and the Flash object will read the variable tagcloud where the sizes, colors and hyperlinks of tags are stored. Finally the tags are visualized like rotating cloud.

It’s not difficult to pass the tag information to JavaScript in pure text. Below is the function which will create an HTML page by default with a tag cloud Flash movie inside it:

Download the source code: tagCloud.r.gz (1.18Kb)
# generating tag cloud in R using Flash and SWFObject                          #
# tagData: a data.frame containing columns 'tag', 'link', 'count' and optional #
#     columns 'color' and 'hicolor'                                            #
# other parameters are self-explaining if you are familiar with                #
#     the WP plugin 'wp-cumulus'                                               #
tagCloud = function(tagData, htmlOutput = "tagCloud.html",
    SWFPath, JSPath, divId = "tagCloudId", width = 600, height = 400,
    transparent = FALSE, tcolor = "333333", tcolor2 = "009900",
    hicolor = "ff0000", distr = "true", tspeed = 100, version = 9,
    bgcolor = "ffffff", useXML = FALSE, htmlTitle = "Tag Cloud",
    noFlashJS, target = NULL, scriptOnly = FALSE) {
    if (missing(SWFPath))
        SWFPath = ""
    if (missing(JSPath))
        JSPath = ""
    if (missing(noFlashJS))
        noFlashJS = "This will be shown to users with no Flash or Javascript."
    tagXML = sprintf("<tags>%s</tags>", paste(sprintf("<a href='%s' style='%s'%s%s%s>%s</a>",
        tagData$link, tagData$count, if (is.null(target))
        else sprintf(" target='%s'", target), if (is.null(tagData$color))
        else ifelse($color), sprintf(" color='0x%s'",
            tagData$color, ""), ""), if (is.null(tagData$hicolor))
        else ifelse($hicolor), sprintf(" hicolor='0x%s'",
            tagData$hicolor, ""), ""), tagData$tag), collapse = ""))
    if (useXML)
        cat(tagXML, file = file.path(dirname(htmlOutput), "tagCloud.xml"))
    cat(ifelse(scriptOnly, "",
    sprintf("<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
    <html xmlns="">
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
        htmlTitle)), sprintf("t<script type="text/javascript" src="%s"></script>",
        JSPath), sprintf("t<div id="%s">%s</div>", divId,
        noFlashJS), sprintf("t<script type="text/javascript">
        ttvar so = new SWFObject("%s", "tagcloud", "%d", "%d", "%d", "#%s");
        %sttso.addVariable("mode", "tags");nttso.addVariable("tcolor", "0x%s");
        ttso.addVariable("tcolor2", "0x%s");nttso.addVariable("hicolor", "0x%s");
        ttso.addVariable("tspeed", "%d");nttso.addVariable("distr", "%s");
        SWFPath, width, height, version, bgcolor, ifelse(transparent,
            "ttso.addParam("wmode", "transparent");n",
            ""), tcolor, tcolor2, hicolor, tspeed, distr, ifelse(useXML,
            "ttso.addVariable("xmlpath", "tagcloud.xml");",
            sprintf("ttso.addVariable("tagcloud", "%s");",
                tagXML)), divId), ifelse(scriptOnly, "", "</body>nn</html>"),
        file = ifelse(scriptOnly, stdout(), htmlOutput), sep = "n")

The main argument is tagData which is a data.frame containing at least three columns (tag, link and count) and looks like:

> head(tagData)
                tag                                        link count
1 2D Kernel Density     1
2         algorithm     1
3         Animation    11
4           AniWiki      2
5            Arcing       1
6          arrows()       1

Additional columns color and hicolor will be used if they exist (hexadecimal numbers specifying RGB), e.g.

> head(tagData)
                tag                                        link count  color hicolor
1 2D Kernel Density     1 2163bb  f0763d
2         algorithm     1 9f0f38  d825b1
3         Animation    11 800130  5b8d6a
4           AniWiki      2 7ce1df  6607b0
5            Arcing       1 df4e4a  f5cdf2
6          arrows()       1 31f5fb  19d50d

3. Example

Here is an example on visualizing my blog tags. You may need the following swf and js files first if you wish the loading would be faster (by default your browser needs to download these two files from first).

Download the tag cloud Flash file tagcloud.swf (33.7Kb) and JavaScript swfoject.js (5.94Kb) as well as the data tagData.gz (1.43Kb).
# use tagCloud(tagData, SWFPath = "tagcloud.swf", JSPath = "swfobject.js")
#    if you have downloaded these files to your work directory, i.e. getwd(),
#    this will save you a few seconds loading the flash

The above code will generate an HTML page like this:

Your browser does not support Flash or Javascript!

You can adjust the parameters as you wish.

4. Other issues

There is still one more step to answer Tony’s original question, namely splitting the speech into single words and computing the frequency. This can be (roughly) done by strsplit(..., split = " ") and table().

Encoding problems may exist in the above code, but URLencode(tagXML) could be of help.

Only Latin characters are supported, but there’s possibility to modify the Flash source file to support other languages. See Roy Tanck’s post for more information.

Other R resources I know so far:

  • The R package R4X by Romain Fran?ois: you can generate an HTML page containing the tags with dynamic classes attached to the <span> tags (install the package and read its vignette: install.packages('R4X', repos=''); vignette('r4xslides', package='R4X'))
  • The R package snippets by Simon Urbanek: there is a function cloud() to create word cloud; words are arranged from top to bottom and left to right

Related Posts

To leave a comment for the author, please follow the link and comment on their blog: Keep on Fighting! » R Language. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)