Blog Archives

A word cloud where the x and y axes mean something

April 17, 2012
By
A word cloud where the x and y axes mean something

Ok so I have now done two iterations on a better way to visualize term frequencies using R, ggplot2 and plyr. The first was ok but ugly, the second was better but still ugly. How to read it: Frequency is segmented in to 20% quantiles The frequency is on the y axis Word size is

Read more »

Word cloud alternatives

April 16, 2012
By
Word cloud alternatives

Here is an alternative to word clouds that makes it easier to get insights, but also has some of the aesthetic appeal of the traditional word cloud. My first attempt at this looked pretty bad and this is not too much better, but hopefully someone else will help improve it. library(languageR) # get english word

Read more »

Stop squinting at word clouds in the hope of getting insights

April 11, 2012
By
Stop squinting at word clouds in the hope of getting insights

Someone recently asked on twitter about about peoples' preferences for cloud generators in R. I replied that I thought the "null" word cloud generator was best. By this I mean that I think the word cloud is a bad visualization method. Why? Here is one article with a good perspective, but you can search for

Read more »

Stupid R tricks: using outer to create many data.frame subsets

February 11, 2012
By
Stupid R tricks: using outer to create many data.frame subsets

Selecting subsets of a data.frame is easy in R if you define the predicates manually. But if you need to define many conditions the standard slicing and subsetting methods are cumbersome. For this illustration I want to pick some large number of numerical ranges and label all of the rows that match any of the

Read more »

Using transparency for data count intuition

September 27, 2011
By
Using transparency for data count intuition

This is an illustration of representing point count in a graphic using transparency. This is easy to do in ggplot2 if you use one of the barchart type of geoms.  However I think there are other situations where it would be useful to apply aesthetics based on point count. Since Hadley did a lot of

Read more »

Getting to know multivariate data

July 25, 2011
By
Getting to know multivariate data

psych::pairs.panels and corrgram::corrgram using mtcars data Core Ideas: multivariate modeling is challenging pair plots make it easy to get a quick understanding of each variable and the relationships between them Multivariate analysis and modeling can be really challenging. Getting the job done well requires you to know your data really well. People often use the

Read more »

Simple plyr/ggplot example of cummulative distribution plots

June 9, 2011
By
Simple plyr/ggplot example of cummulative distribution plots

I’ve been a big fan of ggplot2 for a long time but plyr has been in my toolkit for less than a year and it is now one of my most-used R packages. It is how aggregate/*apply would have been if they were awesome. In five lines this code computes the cumulative distribution functions of

Read more »

My favorite R packages (installed with one command)

December 21, 2010
By
My favorite R packages  (installed with one command)

I just started a new job (working on social search awesomeness at Bing) and so I had to set up my “dev” environment with all of my usual tools (R, python,vim,etc). One thing that made this a bit easier is my habit of keeping an R script around that installs all of my common packages

Read more »

Load R packages…directly from cran if needed

December 12, 2010
By
Load R packages…directly from cran if needed

R works in many ways and on many different OSes which is great, but it also means that if you share a piece of code the recipient may need to install packages to make it work. One thing that I do (adapted from a trick my friend Paul Jin showed me) is use the following

Read more »