ngramr – an R package for Google Ngrams

July 16, 2013 | Stubborn Mule

The recent post How common are common words? made use of unusually explicit language for the Stubborn Mule. As expected, a number of email subscribers reported that the post fell foul of their email filters. Here I will return to the topic of n-grams, while keeping the language cleaner, and ... [Read more...]

What is Tony talking about?

September 17, 2012 | Stubborn Mule

I first experimented with word clouds several years ago and used them to visualise the speeches of Kevin Rudd and Malcolm Turnbull. I have now learned from the Fell Stats blog (via R-Bloggers) that there is an R package for generating word clouds.  The package makes use of tm, a ... [Read more...]

Benford’s Law

April 16, 2012 | Stubborn Mule

Here is a quick quiz. If you visit the Wikipedia page List of countries by GDP, you will find three lists ranking the countries of the world in terms of their Gross Domestic Product (GDP), each list corresponding to a different source of the data. If you pick the list ... [Read more...]

Hottest 100 for 2011

January 26, 2012 | Stubborn Mule

Another year, another Australia Day. Another Australia Day, another Triple J Hottest 100. And that, of course, means an excellent excuse to  set R to work on the chart data. For those outside Australia, the Hottest 100 is a chart of the most popular songs of the previous year, as voted by ... [Read more...]

More colour wheels

November 5, 2011 | Stubborn Mule

In response to my post about colour wheels, I received a suggested enhancement from Drew. The idea is to first match colours based on the text provided and then add nearby colours. This can be done by ordering colours in terms of hue, saturation, and value. The result is a ... [Read more...]

Colour wheels in R

November 5, 2011 | Stubborn Mule

Regular readers will know I use the R package to produce most of the charts that appear here on the blog. Being more quantitative than artistic, I find choosing colours for the charts to be one of the trickiest tasks when designing a chart, particularly as R has so many ... [Read more...]

A gentle introduction to R

January 31, 2011 | Stubborn Mule

Whenever a post on this blog requires some data analysis and perhaps a chart or two, my tool of choice is the versatile statistical programming package R. Developed as an open-source implementation of an engine for the S programming language, R is therefore free. Since commercial mathematical packages can costs ...
[Read more...]

Generate your own Risk Characterization Theatre

October 24, 2010 | Stubborn Mule

In the recent posts Visualizing Smoking Risk and Shades of grey I wrote about the use of “Risk Characterization Theatres” (RCTs) to communicate probabilities. I found the idea in the book The Illusion of Certainty, by Eric Rifkin and Edward Bouwer. Here is how they explain the RCTs: Most of ... [Read more...]

The Mule goes SURFing

July 29, 2010 | Stubborn Mule

A month ago I posted about “SURF”, the newly-established Sydney R user forum (R being an excellent open-source statistics tool). Shortly after publishing that post, I attended the inaugural forum meeting. While we waited for attendees to arrive, a few people introduced themselves, explaining why they were interested in R ... [Read more...]


June 25, 2010 | Stubborn Mule

A new R user group has launched in Sydney. It aims to bring together both experienced R users and complete beginners. The forum will meet monthly with talks on a wide range of subjects exploring all of the facets of this powerful tool. [Read more...]

Graphing using R

May 16, 2010 | Stubborn Mule

Long-time readers of the Stubborn Mule will know that charts are a regular feature here. Almost all of these charts were produced using the R statistical software package which, in my view, produces far superior results to the most commonly used graphing tool: Excel. As a community service to help ...
[Read more...]

