A couple of days ago, I spotted a post by Oli Hawkins on Visualising migration between the countries of the UK which linked to a Sankey diagram demo of Internal migration flows in the UK.
In an R context, there are several libraries available that support the generation of Sankey diagrams, including
googleVis (which wraps Google Chart tools), and a couple of packages that wrap d3.js – an original rCharts Sankey diagram demo by @timelyporfolio, and a more recent HTMLWidgets demo (sankeyD3).
The original data comprised a matrix relating population flows between English regions, Wales, Scotland and Northern Ireland. The simplest rendering of the data using the
googleViz Sankey diagram generator produces an output that uses default colours to label the nodes.
Using the country code indicator at the start of each region/country identifier, we can generate a mapping from country to a country colour that can then be used to identify the country associated with each node.
One of the settings for the diagram allows the source (or target) node colour to determine the edge colour. We can also play with the values we use as node labels:
If we exclude edges relating to flow between regions of the same country, we get a diagram that is more reminiscent of Oli’s orignal (country level) demo. Note also that the charts that are generated are interactive – in this case, we see a popup that describes the flow along one particular edge.
If we associate a country with each region, we can group the data and sum the flow values to produce country level flows. Charting this produces a chart similar to the original inspiration.
As well as providing the code for generating each of the above Sankey diagrams, the Rmd file linked above also includes demonstrations for generating basic Sankey diagrams for the original dataset using the rCharts and htmlwidgets R libraries.
In order to provide a point of comparison, I also generated a python/pandas workflow using Jupyter notebooks and the
ipysankey widget. (In fact, I generated the full workflow through the different chart versions first in pandas – I find it an easier language to think in than R! – and then used that workflow as a crib for the R version…)
The original (default) diagram looks like this:
and the final version, after a bit of data wrangling, looks like this:
Once again, all the code is provided in the notebook.
One of the nice things about all these packages is that they produce outputs than can be reused/embedded elsewhere, or that can be used as a first automatically produced draft of code that can be tweaked by hand. I’ll have more to say about that in a future post…