[This article was first published on Peter's stats stuff - R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
In my last post I had a first look (for me) at Estimated Household Income Inequality data from the University of Texas Inequality Project. These data came to my attention when Professor James K. Galbraith used them in his keynote presentation to the 2016 New Zealand Association of Economists conference. Some of the slides associated with these data include world choropleth maps at five year intervals. I set out to re-create versions of those images, and upgrade to an animated image showing a rolling five year window using the average of any data available for each country in that five year window. Here’s the result:
Gini coefficient is a measure of inequality. On the scale used in this dataset, 100 represents perfect inequality, with one household (in this case) earning all the country’s income. 0 represents perfect equality, with all households on equal incomes. On the map above, blue indicates countries with relatively lower inequality, yellow and red relatively higher. Countries marked grey have no data in the indicated five year period.
The first thing I notice from this graphic is that for large chunks of time big parts of the world are lacking data in this particular collection, with the old USSR, twentieth century China, pre 1990 South America and central Africa standing out in particular. Secondarily, it’s sort of hypnotising to pick a country and see if you can pick the change in inequality over time. The most obvious contrast generally comes at the break point of the repeating cycle, when it moves from 2008 to 1963, and you’re suddenly reminded how much things subtly changed over that time. How good this map is as an analytical tool I’m unsure (we know from Cleveland’s experiments decades ago that people are bad at judging quantity from colour density), but it’s got some degree of communication content.
There are a few complications here arising from what in the data modelling business are referred to as “slowly changing dimensions”, in this case national boundaries. At one point I thought I would want a different set of national boundaries for each year – or at least for the years on either side of the big changes in 1989 and 1990. But there is very little data in the UTIP dataset for pre-1990 countries, with East Germany the standout exception. In contrast, the project team have obtained data on several “countries” well before they were officially recognised as countries – like Eritrea and Papua New Guinea prior to their independence. Using a map in the 1980s that showed East Germany would remove Eritrea from the image.
In the end, I opted to use 1995 borders all the way through. This drops East Germany from the dataset but keeps Eritrea. Macao, Hong Kong and Puerto Rico are also casualties of the combination of changing sovereignty and statistical practice, but this has little visual impact due to their small geographical size.
The luxury to explore the changing boundaries was made possible by Nils Weidmann’s wonderful CShapes project, which makes available country boundaries and capitals back to the end of World War II. The formats available are standard GIS shapefiles and an R package.
Here’s the R code behind that animation:
Related
To leave a comment for the author, please follow the link and comment on their blog: Peter's stats stuff - R.