Blog Archives

ngramr – an R package for Google Ngrams

July 16, 2013
By
ngramr – an R package for Google Ngrams

The recent post How common are common words? made use of unusually explicit language for the Stubborn Mule. As expected, a number of email subscribers reported that the post fell foul of their email filters. Here I will return to the topic of n-grams, while keeping the language cleaner, and describe the R package I developed

Read more »

What is Tony talking about?

September 17, 2012
By
What is Tony talking about?

I first experimented with word clouds several years ago and used them to visualise the speeches of Kevin Rudd and Malcolm Turnbull. I have now learned from the Fell Stats blog (via R-Bloggers) that there is an R package for generating word clouds.  The package makes use of tm, a text mining package for R, which I have been

Read more »

Benford’s Law

April 16, 2012
By
Benford’s Law

Here is a quick quiz. If you visit the Wikipedia page List of countries by GDP, you will find three lists ranking the countries of the world in terms of their Gross Domestic Product (GDP), each list corresponding to a different source of the data. If you pick the list according to the CIA (let’s

Read more »

Hottest 100 for 2011

January 26, 2012
By
Hottest 100 for 2011

Another year, another Australia Day. Another Australia Day, another Triple J Hottest 100. And that, of course, means an excellent excuse to  set R to work on the chart data. For those outside Australia, the Hottest 100 is a chart of the most popular songs of the previous year, as voted by the listeners of

Read more »

More colour wheels

November 5, 2011
By
More colour wheels

In response to my post about colour wheels, I received a suggested enhancement from Drew. The idea is to first match colours based on the text provided and then add nearby colours. This can be done by ordering colours in terms of hue, saturation, and value. The result is a significant improvement and it will capture all of

Read more »

Colour wheels in R

November 5, 2011
By
Colour wheels in R

Regular readers will know I use the R package to produce most of the charts that appear here on the blog. Being more quantitative than artistic, I find choosing colours for the charts to be one of the trickiest tasks when designing a chart, particularly as R has so many colours to choose from. In

Read more »

A gentle introduction to R

January 31, 2011
By
A gentle introduction to R

Whenever a post on this blog requires some data analysis and perhaps a chart or two, my tool of choice is the versatile statistical programming package R. Developed as an open-source implementation of an engine for the S programming language, R is therefore free. Since commercial mathematical packages can costs thousands of dollars, this alone

Read more »

Generate your own Risk Characterization Theatre

October 24, 2010
By
Generate your own Risk Characterization Theatre

In the recent posts Visualizing Smoking Risk and Shades of grey I wrote about the use of “Risk Characterization Theatres” (RCTs) to communicate probabilities. I found the idea in the book The Illusion of Certainty, by Eric Rifkin and Edward Bouwer. Here is how they explain the RCTs: Most of us are familiar with the crowd in a

Read more »

The Mule goes SURFing

July 29, 2010
By

A month ago I posted about “SURF”, the newly-established Sydney R user forum (R being an excellent open-source statistics tool). Shortly after publishing that post, I attended the inaugural forum meeting. While we waited for attendees to arrive, a few people introduced themselves, explaining why they were interested in R and how much experience they

Read more »

Surf

June 25, 2010
By

A new R user group has launched in Sydney. It aims to bring together both experienced R users and complete beginners. The forum will meet monthly with talks on a wide range of subjects exploring all of the facets of this powerful tool.

Read more »