playitbyr now offers more options and combinations for data sonification (exploring data through sound), with a
ggplot2-inspired syntax. See the website for examples and how to get started.
The recent Sonification Handbook has a chapter devoted to exploratory data analysis with sonification. With some help from Sam Ferguson, one of the chapter’s authors, I’ve made it easy to implement those techniques using
R. The following are recreations of the chapter’s sound examples, all exploring Edgar Anderson’s iris data.
The auditory dotplot gives a quick univariate view of the measured lengths of iris petals data, by mapping those onto clicks in time. Earlier clicks represent shorter petals.
First, we specify basic aspects of the
sonify, analogous to a
ggplot object: the data is iris, and we’re mapping
Petal.Length onto time. We add on the layer
shape_dotplot, adding in a bit of noise via
jitter to avoid overplotting, and specify that we want the output scaled to the range of 0 to 15 seconds.
The auditory histograms are one of the most creative touches in the chapter. They give a sense of the frequency distribution by repeatedly sampling from the data (without replacement), and mapping each sampled value to a pitch. This can be played indefinitely to give a sense of the distributions shape–lots of notes in the middle of the range indicate a heavily central distribution, for instance.
This design can be used effectively in combination with
sonfacet to compare different values of a categorical variable. For instance, we can listen to
Sepal.Length, faceted by the three different iris species (setosa, versicolor, and virginica):
You can hear how each species clusters in a different area and get a sense of how spread-out they are. Here’s the code:
We again choose
iris as the data set, and now map
Sepal.Width to time. Then, we facet by
Species; faceting means we simply split the data by
levels(Species), create the sonification for each level, and then play the sonifications one after another. Finally, we add on
shape_histogram, where the length of time that each sonification plays is 3 seconds and the samples are drawn at a rate of 1800 beats per minute.
shape_boxplot is a similar principle. The same sampling occurs, only now there are three phases: first, the entire range of the data, then only from the interquartile range (the 25th to 75th percentile), and finally just the median. This can help give ideas of both center and spread for a variable. We’ll again look at
Sepal.Length and facet by
The code is identical, except the
length parameter refers to the length of each of the segments of the boxplot, rather than the whole facet.
I hope to incorporate speech, easier-to-set up audio integration, lots more sounds, and other goodies in future versions; you can view and fork the code on its github.