How to create a Sankey plot in R?

[This article was first published on Data Science Tutorials, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The post How to create a Sankey plot in R? appeared first on Data Science Tutorials

What do you have to lose?. Check out Data Science tutorials here Data Science Tutorials.

How to create a Sankey plot in R?, You must install the ggsankey library and modify your dataset using the package’s make_long function in order to produce a Sankey diagram in ggplot2.

The data’s columns must correspond to the stages x (current stage), next_x (next stage), node (current node), and next_node (the following node).

Keep in mind that the final stage should indicate a NA.

A Side-by-Side Boxplot in R: How to Do It – Data Science Tutorials

Let’s install the remotes packages first,

install.packages("remotes")

Now we can install ggsankey package

remotes::install_github("davidsjoberg/ggsankey")
library(ggsankey)

Load Data

We can make use of mtcars data sets in R.

df <- mtcars %>%
  make_long(cyl, vs, am, gear, carb)
df
    x   node next_x next_node
1  cyl    6     vs         0
2   vs    0     am         1
3   am    1   gear         4
4 gear    4   carb         4
5 carb    4   <NA>        NA
6  cyl    6     vs         0

How to Create an Interaction Plot in R? – Data Science Tutorials

Sankey plot with ggsankey

To construct Sankey diagrams in ggplot2, the ggsankey package includes a geom called geom_sankey.

Keep in mind that you must give a factor as the fill colour when passing the variables to aes. The theme theme_sankey is also present in the function.

Let’s load ggplot2 for graph generation

library(ggplot2)
library(dplyr)
ggplot(df, aes(x = x,
               next_x = next_x,
               node = node,
               next_node = next_node,
               fill = factor(node))) +
  geom_sankey() +
  theme_sankey(base_size = 16)

How to add labels in Sankey Plot

The package’s geom_sankey_label function lets you add labels to Sankey diagrams.

Remember to give the variable you want to display as the label inside the aes.

ggplot(df, aes(x = x,
               next_x = next_x,
               node = node,
               next_node = next_node,
               fill = factor(node),
               label = node)) +
  geom_sankey() +
  geom_sankey_label() +
  theme_sankey(base_size = 16)

How to Add Superscripts and Subscripts to Plots in R? (datasciencetut.com)

How to do Color customization in Sankey Plot

To alter how the Sankey diagram appears in R, a variety of arguments can be changed. The author of the program produced the following pictures as examples.

geom_sankey aesthetics
geom_sankey geometries
Color and fill of the Sankey plot

For instance, by adjusting the fill color palette and a few of the inputs to the geom_sankey_function, we can produce something like this.

ggplot(df, aes(x = x,
               next_x = next_x,
               node = node,
               next_node = next_node,
               fill = factor(node),
               label = node)) +
  geom_sankey(flow.alpha = 0.5, node.color = 1) +
  geom_sankey_label(size = 3.5, color = 1, fill = "white") +
  scale_fill_viridis_d(option = "A", alpha = 0.95) +
  theme_sankey(base_size = 16)

How to Label Outliers in Boxplots in ggplot2? (datasciencetut.com)

Changing the title of the legend

Changes to the legend’s title are available, just like with other ggplot2 charts. Here are several options for action.

ggplot(df, aes(x = x,
               next_x = next_x,
               node = node,
               next_node = next_node,
               fill = factor(node),
               label = node)) +
  geom_sankey(flow.alpha = 0.5, node.color = 1) +
  geom_sankey_label(size = 3.5, color = 1, fill = "white") +
  scale_fill_viridis_d() +
  theme_sankey(base_size = 16) +
  guides(fill = guide_legend(title = "Title"))

How to Add a caption to ggplot2 Plots in R? (datasciencetut.com)

Removing the legend

Finally, you can adjust the Sankey plot legend’s position to “none” if you want to remove it.

ggplot(df, aes(x = x,
               next_x = next_x,
               node = node,
               next_node = next_node,
               fill = factor(node),
               label = node)) +
  geom_sankey(flow.alpha = 0.5, node.color = 1) +
  geom_sankey_label(size = 3.5, color = 1, fill = "white") +
  scale_fill_viridis_d() +
  theme_sankey(base_size = 16) +
  theme(legend.position = "none")

Changing the Font Size in Base R Plots – Data Science Tutorials

The post How to create a Sankey plot in R? appeared first on Data Science Tutorials

Learn how to expert in the Data Science field with Data Science Tutorials.

To leave a comment for the author, please follow the link and comment on their blog: Data Science Tutorials.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)