Site icon R-bloggers

How to Create Sankey Diagrams From Tables (Data Frames) Using R

[This article was first published on R – Displayr, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Create a Sankey Diagram in R with Displayr!

Step 1: Create a Tidy data frame

The very first step in creating visualizations is to get the data in a useful format. In the case of Sankey diagrams, the trick is to get the data into the tidy data format. This post uses a simple example to make it clear how everything fits together. Below, you can see the R code to create a small data frame. I’ve shown this as a table, followed by the resulting Sankey diagram.

 
my.data = data.frame(Married = c("Yes","Yes", "Yes", "No", "No"),
    Pet = c("Yes", "Yes", "No", "Yes", "No"),
    Happy = c("Yes", "Yes", "Yes", "Yes", "No"),
    freq = 5:1)



A few things to note:

Create a Sankey Diagram in R with Displayr!


Step 2: Install the flipPlot package

The Sankey diagrams I am using in this post, come from our flipPlots package (Displayr/flipPlots). If you don’t know how to install from GitHub, please see how to install packages from GitHub.

Step 3: Create the Sankey diagram

We created the first of the Sankey diagrams shown in this post using the code below. Note that the data frame is passed in as the first argument, but the fourth column (the one containing the weight) has been removed. I’ve set link.color to “Source”, which sets the colors that emanate from the same node to be consistent.

library(flipPlots)
SankeyDiagram(my.data[, -4],
              link.color = "Source", 
              weights = my.data$freq) 

I’ve provided the code for the second sankey diagram shown in the post below. The only difference from the previous code is that I’ve used label.show.varname = FALSE, to prevent the variable names to from being shown in the sankey diagram.

library(flipPlots)
SankeyDiagram(my.data.2[, -4],
              link.color = "Source",
              label.show.varname = FALSE,
              weights = my.data.2$freq)

Create a Sankey Diagram in R with Displayr!

More complicated sankey diagrams

If you want to create more complicated Sankey diagrams, which do not easily fit into the structure of a table (data frame), please see Creating Custom Sankey Diagrams Using R.

Acknowledgements

The Sankey diagrams are created using a modified version of networkD3, created by Kenton Russell (timelyportfolio/networkD3@feature/responsive). networkD3 is an HTMLwidget version of Mike Bostock’s D3 Sankey diagram code, which is inspired by Tom Counsell’s Sankey library.

To leave a comment for the author, please follow the link and comment on their blog: R – Displayr.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.