Site icon R-bloggers

Introducing {ggflowchart}

[This article was first published on R on Nicola Rennie, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Back in April 2022, I participated in the #30DayChartChallenge and for the Storytelling prompt on day 29 in the Uncertainty category, I created the Goldilocks Decision Tree. I also gave a talk to R-Ladies Nairobi on the challenge and used the flowchart as a live-coding example. A summary of the talk ended up as a blog post.

The reactions were pretty positive, and the suggestions from Twitter that it should become its own R package have been floating around in my mind since then. So here it is! {ggflowchart} – the package for creating simple flowcharts using {ggplot2}.

What does {ggflowchart} do? < svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> < path d="M0 0h24v24H0z" fill="currentColor"> < path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z">

Flowcharts can be a useful way to visualise complex processes. However, I couldn’t find an easy way to create a flowchart in R. There are a few packages for either drawing basic components of flowcharts (like {grid}), packages that are great for visualising complex network data where order doesn’t really matter (like {ggnetwork} and {igraph}), but none of them gave me the control over customisation I was used to with {ggplot2}.

{ggflowchart} tries to fill that gap. The aim of {ggflowchart} is to help R users make simple, good-looking flowcharts, with as little code as possible. It computes a layout, then uses existing {ggplot2} functions to stitch together rectangles, text, and arrows.

Installing {ggflowchart} < svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> < path d="M0 0h24v24H0z" fill="currentColor"> < path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z">

As of 11 May 2023, {ggflowchart} is officially available on CRAN. You can install {ggflowchart} using install.packages("ggflowchart").

You can also install the development version from GitHub:

1
remotes::install_github("nrennie/ggflowchart")

At the moment, {ggflowchart} has reasonably few dependencies (most of them common R packages you’re already likely to have installed if you’ve been working with {ggplot2}).

A few examples < svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> < path d="M0 0h24v24H0z" fill="currentColor"> < path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z">

To show you how {ggflowchart} actually works, let’s go through a couple of small examples. The examples explained here are also included in the vignettes for future reference.

1
library(ggflowchart)

The simplest flowchart, which may well be all you need, takes only a tibble (or data frame) containing two columns. One column detailing where an edge in the flowchart begins, and the second column detailing where it ends. Ideally, you’ll name these columns "from" and "to" but if not, {ggflowchart} assumes the first two columns of your tibble relate to the start and end points of edges.

1
2
data <- tibble::tibble(from = c("A", "A", "A", "B", "C", "F"),
                       to = c("B", "C", "D", "E", "F", "G"))

To construct the flowchart, you simply pass in the data frame of edges to the ggflowchart() function.

1
ggflowchart(data)

The ggflowchart() function does have a few additional arguments that you can use to change the appearance of the flowchart, shown below with their default values.

Additional arguments color, text_color, and arrow_color allow alternate spellings of colour.

For example, we may wish to switch to a red flowchart, with a sans serif , and slightly more square boxes.

1
2
3
4
5
6
ggflowchart(data,
            colour = "red",
            text_colour = "red",
            arrow_colour = "red",
            family = "serif",
            x_nudge = 0.25)

The fill colour and text_colour can also be changed based on the name of a column in the node_data data frame. For example:

1
2
3
4
5
6
node_data <- tibble::tibble(
  name = c("A", "B", "C", "D", "E", "F", "G"),
  type = c("Type 1", "Type 1", "Type 1", "Type 1", 
           "Type 2", "Type 2", "Type 2")
  )
ggflowchart(data, node_data, fill = type)

The column names to colour by can be either quoted or unquoted, e.g. ggflowchart(data, node_data, fill = "type") will produce the same result. Column names take priority over colour names. So if you have a column in node_data called "blue" – it will use the values in that column rather than colouring all nodes blue.

A more complex example showing how to change the layout using scale_x_reverse(), add titles, change the background colour, and edit the labels is included in another vignette.

What’s coming next? < svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> < path d="M0 0h24v24H0z" fill="currentColor"> < path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z">

{ggflowchart} is currently a work in progress, and there are already a few things on my list to add into the next release! Upcoming features will include:

Some of these features are easier than others, and some will require a bit of thought about design choices. Changing the arrow colours and node outline colours is a little bit tricky because we may end up with three separate colour scales here. Layouts are currently based on the tree output from {igraph} but users may wish to move some of the nodes around. Hopefully soon users could define their own x and y coordinates to position the boxes. Arrows between boxes on the same level, e.g. between E and F in the minimal example are currently a bit hit or miss…

1
2
3
data <- tibble::tibble(from = c("A", "A", "A", "B", "C", "F", "E"),
                       to = c("B", "C", "D", "E", "F", "G", "F"))
ggflowchart(data)

These features are currently listed as issues on GitHub, and I will slowly work my way through them. If you have other suggestions for new features, or if you find a bug, please create an issue in the GitHub repository.

The most important part… < svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> < path d="M0 0h24v24H0z" fill="currentColor"> < path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z">

Of course, the most important part of any R package is the hex sticker! As a nod to the Goldilocks Decision Tree flowchart that inspired the package in the first place, the hex sticker for {ggflowchart} features three bears!

To leave a comment for the author, please follow the link and comment on their blog: R on Nicola Rennie.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Exit mobile version