playitbyr 0.2-1: data through sound, now with layers, facets, and more pleasure

May 6, 2012

(This article was first published on Statisfactions: The Sounds of Data and Whimsy » R, and kindly contributed to R-bloggers)

The latest playitbyr now offers more options and combinations for data sonification (exploring data through sound), with a ggplot2-inspired syntax. See the website for examples and how to get started.

The recent Sonification Handbook has a chapter devoted to exploratory data analysis with sonification. With some help from Sam Ferguson, one of the chapter’s authors, I’ve made it easy to implement those techniques using R. The following are recreations of the chapter’s sound examples, all exploring Edgar Anderson’s iris data.

Auditory dotplot

The auditory dotplot gives a quick univariate view of the measured lengths of iris petals data, by mapping those onto clicks in time. Earlier clicks represent shorter petals.

  sonify(data = iris, mapping = sonaes(time = Petal.Length)) +
     shape_dotplot(jitter = 0.3) + 
     scale_time_continuous(soundlimits = c(0, 15))

First, we specify basic aspects of the sonify, analogous to a ggplot object: the data is iris, and we’re mapping Petal.Length onto time. We add on the layer shape_dotplot, adding in a bit of noise via jitter to avoid overplotting, and specify that we want the output scaled to the range of 0 to 15 seconds.

Auditory histogram

The auditory histograms are one of the most creative touches in the chapter. They give a sense of the frequency distribution by repeatedly sampling from the data (without replacement), and mapping each sampled value to a pitch. This can be played indefinitely to give a sense of the distributions shape–lots of notes in the middle of the range indicate a heavily central distribution, for instance.

This design can be used effectively in combination with sonfacet to compare different values of a categorical variable. For instance, we can listen to Sepal.Length, faceted by the three different iris species (setosa, versicolor, and virginica):

You can hear how each species clusters in a different area and get a sense of how spread-out they are. Here’s the code:

sonify(iris, sonaes(pitch = Sepal.Length)) + sonfacet(Species) +
   + shape_histogram(length = 3, tempo = 1800)

We again choose iris as the data set, and now map Sepal.Width to time. Then, we facet by Species; faceting means we simply split the data by levels(Species), create the sonification for each level, and then play the sonifications one after another. Finally, we add on shape_histogram, where the length of time that each sonification plays is 3 seconds and the samples are drawn at a rate of 1800 beats per minute.

Audio boxplot

shape_boxplot is a similar principle. The same sampling occurs, only now there are three phases: first, the entire range of the data, then only from the interquartile range (the 25th to 75th percentile), and finally just the median. This can help give ideas of both center and spread for a variable. We’ll again look at Sepal.Length and facet by Species here:

 sonify(iris, sonaes(pitch = Sepal.Length)) + sonfacet(Species) +
       shape_boxplot(length = 1, tempo = 1800)

The code is identical, except the length parameter refers to the length of each of the segments of the boxplot, rather than the whole facet.

I hope to incorporate speech, easier-to-set up audio integration, lots more sounds, and other goodies in future versions; you can view and fork the code on its github.

To leave a comment for the author, please follow the link and comment on their blog: Statisfactions: The Sounds of Data and Whimsy » R. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.


Mango solutions

RStudio homepage

Zero Inflated Models and Generalized Linear Mixed Models with R

Quantide: statistical consulting and training


CRC R books series

Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)