[This article was first published on Statisfactions: The Sounds of Data and Whimsy » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The latest `playitbyr` now offers more options and combinations for data sonification (exploring data through sound), with a `ggplot2`-inspired syntax. See the website for examples and how to get started.

The recent Sonification Handbook has a chapter devoted to exploratory data analysis with sonification. With some help from Sam Ferguson, one of the chapter’s authors, I’ve made it easy to implement those techniques using `R`. The following are recreations of the chapter’s sound examples, all exploring Edgar Anderson’s iris data.

## Auditory dotplot

The auditory dotplot gives a quick univariate view of the measured lengths of iris petals data, by mapping those onto clicks in time. Earlier clicks represent shorter petals.

 ``` sonify(data = iris, mapping = sonaes(time = Petal.Length)) + shape_dotplot(jitter = 0.3) + scale_time_continuous(soundlimits = c(0, 15))```

First, we specify basic aspects of the `sonify`, analogous to a `ggplot` object: the data is iris, and we’re mapping `Petal.Length` onto time. We add on the layer `shape_dotplot`, adding in a bit of noise via `jitter` to avoid overplotting, and specify that we want the output scaled to the range of 0 to 15 seconds.

## Auditory histogram

The auditory histograms are one of the most creative touches in the chapter. They give a sense of the frequency distribution by repeatedly sampling from the data (without replacement), and mapping each sampled value to a pitch. This can be played indefinitely to give a sense of the distributions shape–lots of notes in the middle of the range indicate a heavily central distribution, for instance.

This design can be used effectively in combination with `sonfacet` to compare different values of a categorical variable. For instance, we can listen to `Sepal.Length`, faceted by the three different iris species (setosa, versicolor, and virginica):

You can hear how each species clusters in a different area and get a sense of how spread-out they are. Here’s the code:

 ```sonify(iris, sonaes(pitch = Sepal.Length)) + sonfacet(Species) + + shape_histogram(length = 3, tempo = 1800)```

We again choose `iris` as the data set, and now map `Sepal.Width` to time. Then, we facet by `Species`; faceting means we simply split the data by `levels(Species)`, create the sonification for each level, and then play the sonifications one after another. Finally, we add on `shape_histogram`, where the length of time that each sonification plays is 3 seconds and the samples are drawn at a rate of 1800 beats per minute.

## Audio boxplot

`shape_boxplot` is a similar principle. The same sampling occurs, only now there are three phases: first, the entire range of the data, then only from the interquartile range (the 25th to 75th percentile), and finally just the median. This can help give ideas of both center and spread for a variable. We’ll again look at `Sepal.Length` and facet by `Species` here:

 ``` sonify(iris, sonaes(pitch = Sepal.Length)) + sonfacet(Species) + shape_boxplot(length = 1, tempo = 1800)```

The code is identical, except the `length` parameter refers to the length of each of the segments of the boxplot, rather than the whole facet.

I hope to incorporate speech, easier-to-set up audio integration, lots more sounds, and other goodies in future versions; you can view and fork the code on its github.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.