How to create a Sankey plot in R?, You must install the ggsankey library and modify your dataset using the package’s make_long function in order to produce a Sankey diagram in ggplot2.

The data’s columns must correspond to the stages x (current stage), next_x (next stage), node (current node), and next_node (the following node).

Keep in mind that the final stage should indicate a NA.

Let’s install the remotes packages first,

`install.packages("remotes")`

Now we can install ggsankey package

```remotes::install_github("davidsjoberg/ggsankey")
library(ggsankey)```

We can make use of mtcars data sets in R.

```df <- mtcars %>%
make_long(cyl, vs, am, gear, carb)
df
x   node next_x next_node
1  cyl    6     vs         0
2   vs    0     am         1
3   am    1   gear         4
4 gear    4   carb         4
5 carb    4   <NA>        NA
6  cyl    6     vs         0```

### Sankey plot with ggsankey

To construct Sankey diagrams in ggplot2, the ggsankey package includes a geom called geom_sankey.

Keep in mind that you must give a factor as the fill colour when passing the variables to aes. The theme theme_sankey is also present in the function.

Let’s load ggplot2 for graph generation

```library(ggplot2)
library(dplyr)
ggplot(df, aes(x = x,
next_x = next_x,
node = node,
next_node = next_node,
fill = factor(node))) +
geom_sankey() +
theme_sankey(base_size = 16)``` ### How to add labels in Sankey Plot

The package’s geom_sankey_label function lets you add labels to Sankey diagrams.

Remember to give the variable you want to display as the label inside the aes.

```ggplot(df, aes(x = x,
next_x = next_x,
node = node,
next_node = next_node,
fill = factor(node),
label = node)) +
geom_sankey() +
geom_sankey_label() +
### How to do Color customization in Sankey Plot

To alter how the Sankey diagram appears in R, a variety of arguments can be changed. The author of the program produced the following pictures as examples.

```geom_sankey aesthetics
geom_sankey geometries
Color and fill of the Sankey plot```

For instance, by adjusting the fill color palette and a few of the inputs to the geom_sankey_function, we can produce something like this.

```ggplot(df, aes(x = x,
next_x = next_x,
node = node,
next_node = next_node,
fill = factor(node),
label = node)) +
geom_sankey(flow.alpha = 0.5, node.color = 1) +
geom_sankey_label(size = 3.5, color = 1, fill = "white") +
scale_fill_viridis_d(option = "A", alpha = 0.95) +
### Changing the title of the legend

Changes to the legend’s title are available, just like with other ggplot2 charts. Here are several options for action.

```ggplot(df, aes(x = x,
next_x = next_x,
node = node,
next_node = next_node,
fill = factor(node),
label = node)) +
geom_sankey(flow.alpha = 0.5, node.color = 1) +
geom_sankey_label(size = 3.5, color = 1, fill = "white") +
scale_fill_viridis_d() +
theme_sankey(base_size = 16) +
### Removing the legend

Finally, you can adjust the Sankey plot legend’s position to “none” if you want to remove it.

```ggplot(df, aes(x = x,
next_x = next_x,
node = node,
next_node = next_node,
fill = factor(node),
label = node)) +
geom_sankey(flow.alpha = 0.5, node.color = 1) +
geom_sankey_label(size = 3.5, color = 1, fill = "white") +
scale_fill_viridis_d() +
theme_sankey(base_size = 16) +
