[This article was first published on Statisfactions: The Sounds of Data and Whimsy » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Today I unveil my very first statistical YouTube video! I will do anything to keep you statisfied, and if that means YouTube, then so be it.

But first, some exposition: In Panama, 10 percent of the population owned 45 percent of the income in 2000, whereas the bottom 10 percent owned only 0.6 percent. How typical is this? And how does inequality relate to how well-off the country is in general?

I found detailed income distribution information for the year 2000 for 50 countries from the UNU-WIDER World Income Inequality Database (WIID). The data showed the percentage of the country’s income that each 10 percent of the population owned.

So let’s hear what this sounds like! Here’s two countries with very different income distributions (Austria and Bolivia), played back-to-back using Csound:

Each pulse represents 10 percentage points. The steady pulse of lower-pitched beats represent each 10 percent of the population, ordered from lowest- to highest- income, and the higher-pitched sounds represent 10 percent of the income. In an equal society, these two rhythms would be identical–each 10 percent of the population has 10 percent of the income, so all the beats would line up.

Of course, the world doesn’t work like that. For another extreme, in a society where only the top 10 percent of the population had any income, there would be no high pulses for the first 9 pulses of the population–since they have no income–and then a pile-up of 10 income pulses right at the end.

Notice how the income pulses on the second half of the above sound clip starts much slower than the first–and end much faster. This is because Bolivia is much more unequal than Austria, so top segments of the population at the end of the clip are grabbing more of the income beats for themselves.

How does this compare to how rich the country is overall? And what about that YouTube video? I used R’s excellent WDI package to pull income per capita data from the World Bank’s World Development Indicators database (GDP per capita, PPP adjusted, to be exact).

I plotted these and made an animation highlighting each country using Yihui Xie’s wicked-cool animation package; I ordered the countries from most equal to most unequal (by their Gini coefficients, a measure of inequality where 0 = perfectly equal and 1 = perfectly unequal). As each country is highlighted, you can hear the distribution of income that I created in Csound and see their Gini coefficient:

Again, you can hear more and more of the income being claimed by the richest in society as the video progresses. The most-unequal countries indeed tend to be lower-income, but note that the relation doesn’t seem to go the other way–Belarus, one of the poorer countries int this list, is also one of the most equal. And United States, one of the richest, is near the middle in terms of equality. (Many countries, such as China and India, aren’t represented in this list.)

Why rhythm? After all, many sonic representations of data consist of changing pitches over time–including my previous post. It’s an intuitive analogy with visual graphs, with time acting as the horizontal axis and pitch as the vertical axis. But this sensible way of doing things, which was the first thing I tried, failed me on this project. Here’s where I tried to represent Estonia’s inequality through pitches, where the pitch in the left speaker represents the cumulative percentage of the income, and the pitch in the right speaker represents the cumulative percentage of the population:

The problem is that what’s most important is not the difference of the pitches, but the difference in the rate at which the pitches are changing. Can you hear that? I sure can’t. So instead, I took that idea of a “rate” and used varying speeds of rhythm to represent it, which forms a better analogy to what we can hear.

Notes

I used a similar approach between R and Csound to my previous post, and my usage of the animation package was fairly straightforward–all of the code is available on my GitHub page and has a README that should help you get started. Data-wrangling this to the right shape was helped hugely by Hadley Wickham’s life-saving reshape and plyr packages. The pulse sounds are just white noise put through a band-pass filter.

If you want to reproduce this but don’t want to do the data-wrangling yourself, note that my GitHub page has CSVs that are inputs into the animations and the Csound stuff.

Putting sound and animation together and encoding was finally done with OpenShot, after trying unsuccessfully with many other Linux video editors. Phew! Thanks for working, OpenShot!

If you’re a fellow nerdy Coloradoite, be sure to check out my talk, R in Concert, on Tuesday, February 15 for the Denver R User Group!

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.