The ggforce Awakens (again)

March 6, 2019
By

(This article was first published on Data Imaginist, and kindly contributed to R-bloggers)

After what seems like a lifetime (at least to me), a new feature release of
ggforce is available on CRAN. ggforce is my general purpose extension package
for ggplot2, my first early success, what got me on twitter in the first place,
and ultimately instrumental in my career move towards full-time software/R
development. Despite this pedigree ggforce haven’t really received much love in
the form of a feature release since, well, since it was released. One of the
reasons for this is that after the first release I began pushing changes to
ggplot2 that allowed for different stuff I wanted to do in ggforce, so the
release of the next ggforce version became tied to the release of ggplot2. This
doesn’t happen every day, and when it eventually transpired, I was deep in
patchwork and gganimate development, and couldn’t take time off to run the last
mile with ggforce. In the future I’ll probably be more conservative with my
ggplot2 version dependency, or at least keep it out of the main branch until a
ggplot2 release is in sight.

Enough excuses though, a new version is finally here and it’s a glorious one.
Let’s celebrate! This version both brings a slew of refinements to existing
functionality as well as a wast expanse of new features, so there’s enough to
dig into.

New features

This is why we’re all here, right? The new and shiny! Let’s get going; the list
is pretty long.

The Shape of Geoms

Many of the new and current geoms and stats in ggforce are really there to allow
you to draw different types of shapes easily. This means that the workhorse of
these has been geom_polygon(), while ggforce provided the means to describe
the shapes in meaningful ways (e.g. wedges, circles, thick arcs). With the new
release all of these geoms (as well as the new ones) will use the new
geom_shape() under the hood. The shape geom is an extension of the polygon one
that allows a bit more flourish in how the final shape is presented. It does
this by providing two additional parameters: expand and radius, which will
allow fixed unit expansion (and contraction) of the polygons as well as rounding
of the corners based on a fixed unit radius. What do I mean with fixed unit?
In the same way as the points in geom_point stay the same size during resizing
of the plot, so does the corner radius and expansion of the polygon.

Let us modify the goem_polygon() example to use geom_shape() to see what it
is all about:

library(ggforce)

ids <- factor(c("1.1", "2.1", "1.2", "2.2", "1.3", "2.3"))
values <- data.frame(
  id = ids,
  value = c(3, 3.1, 3.1, 3.2, 3.15, 3.5)
)
positions <- data.frame(
  id = rep(ids, each = 4),
  x = c(2, 1, 1.1, 2.2, 1, 0, 0.3, 1.1, 2.2, 1.1, 1.2, 2.5, 1.1, 0.3,
  0.5, 1.2, 2.5, 1.2, 1.3, 2.7, 1.2, 0.5, 0.6, 1.3),
  y = c(-0.5, 0, 1, 0.5, 0, 0.5, 1.5, 1, 0.5, 1, 2.1, 1.7, 1, 1.5,
  2.2, 2.1, 1.7, 2.1, 3.2, 2.8, 2.1, 2.2, 3.3, 3.2)
)
datapoly <- merge(values, positions, by = c("id"))

# Standard look
ggplot(datapoly, aes(x = x, y = y)) +
  geom_polygon(aes(fill = value, group = id))

# Contracted and rounded
ggplot(datapoly, aes(x = x, y = y)) +
  geom_shape(aes(fill = value, group = id), 
             expand = unit(-2, 'mm'), radius = unit(5, 'mm'))

If you’ve never needed this, it may be the kind of thing you go
why even bother, but if you’ve needed to venture into Adobe Illustrator to add
this kind of flourish it is definitely something where you appreciate the lack of
this round-trip. And remember: you can stick this at anything that expects a
geom_polygon — not just the ones from ggforce.

More shape primitives

While geom_shape() is the underlying engine for drawing, ggforce adds a bunch
of new shape parameterisations, which we will quickly introduce:

geom_ellipse makes, you guessed it, ellipses. Apart from standard ellipses it
also offers the possibility of making super-ellipses so if you’ve been dying to
draw those with ggplot2, now is your time to shine.

# Not an ordinary ellipse — a super-ellipse
ggplot() +
  geom_ellipse(aes(x0 = 0, y0 = 0, a = 6, b = 3, angle = -pi / 3, m1 = 3)) +
  coord_fixed()

geom_bspline_closed allows you to draw closed b-splines. It takes the same
type of input as geom_polygon but calculates a closed b-spline from the corner
points instead of just connecting them.

# Create 6 random control points
controls <- data.frame(
  x = runif(6),
  y = runif(6)
)

ggplot(controls, aes(x, y)) +
  geom_polygon(fill = NA, colour = 'grey') +
  geom_point(colour = 'red') +
  geom_bspline_closed(alpha = 0.5)

geom_regon draws regular polygons of a set radius and number of sides.

ggplot() +
  geom_regon(aes(x0 = runif(8), y0 = runif(8), sides = sample(3:10, 8),
                 angle = 0, r = runif(8) / 10)) +
  coord_fixed()

geom_diagonal_wide draws thick diagonals (quadratic bezier paths with the two
control points pointing towards each other but perpendicular to the same axis)

data <- data.frame(
  x = c(1, 2, 2, 1, 2, 3, 3, 2),
  y = c(1, 2, 3, 2, 3, 1, 2, 5),
  group = c(1, 1, 1, 1, 2, 2, 2, 2)
)

ggplot(data) +
  geom_diagonal_wide(aes(x, y, group = group))

Is it a Sankey? Is it an Alluvial? No, It’s a Parallel Set

Speaking of diagonals, one of the prime uses of this is for creating parallel
sets visualizations. There’s a fair bit of nomenclature confusion with this,
so you may know this as Sankey diagrams, or perhaps alluvial plots. I’ll insist
that Sankey diagrams are specifically for following flows (and often employs a
more loose positioning of the axes) and alluvial plots are for following
temporal changes, but we can all be friends no matter what you call it. ggforce
allows you to create parallel sets plots with a standard layered geom approach
(for another approach to this problem, see
the ggalluvial package). The main
problem is that data for parallel sets plots are usually not represented very
well in the tidy format expected by ggplot2, so ggforce further provides a
reshaping function to get the data in line for plotting:

titanic <- reshape2::melt(Titanic)
# This is how we usually envision data for parallel sets
head(titanic)
##   Class    Sex   Age Survived value
## 1   1st   Male Child       No     0
## 2   2nd   Male Child       No     0
## 3   3rd   Male Child       No    35
## 4  Crew   Male Child       No     0
## 5   1st Female Child       No     0
## 6   2nd Female Child       No     0
# Reshape for putting the first 4 columns as axes in the plot
titanic <- gather_set_data(titanic, 1:4)
head(titanic)
##   Class    Sex   Age Survived value id     x    y
## 1   1st   Male Child       No     0  1 Class  1st
## 2   2nd   Male Child       No     0  2 Class  2nd
## 3   3rd   Male Child       No    35  3 Class  3rd
## 4  Crew   Male Child       No     0  4 Class Crew
## 5   1st Female Child       No     0  5 Class  1st
## 6   2nd Female Child       No     0  6 Class  2nd
# Do the plotting
ggplot(titanic, aes(x, id = id, split = y, value = value)) +
  geom_parallel_sets(aes(fill = Sex), alpha = 0.3, axis.width = 0.1) +
  geom_parallel_sets_axes(axis.width = 0.1) +
  geom_parallel_sets_labels(colour = 'white')

As can be seen, the parallel sets plot consist of several layers, which is
something required for many, more involved, composite plot types. Separating
them into multiple layers gives you more freedom without over-poluting the
argument and aesthetic list.

The markings of a great geom

If there is one thing of general utility lacking in ggplot2 it is probably the
ability to annotate data cleanly. Sure, there’s geom_text()/geom_label() but
using them requires a fair bit of fiddling to get the best placement and
further, they are mainly relevant for labeling and not longer text. ggrepel
has improved immensely on the fiddling part, but the lack of support for longer
text annotation as well as annotating whole areas is still an issue.

In order to at least partly address this, ggforce includes a family of geoms
under the geom_mark_*() moniker. They all behaves equivalently except for how
they encircle the given area(s). The 4 different geoms are:

  • geom_mark_rect() encloses the data in the smallest enclosing rectangle
  • geom_mark_circle() encloses the data in the smallest enclosing circle
  • geom_mark_ellipse() encloses the data in the smallest enclosing ellipse
  • geom_mark_hull() encloses the data with a concave or convex hull

All the enclosures are calculated at draw time so respond to resizing (most are
susceptible to changing aspect ratios), and further uses geom_shape() with a
default expansion and radius set, so that the enclosure is always slightly
larger than the data it needs to enclose.

Just to give a quick sense of it, here’s an example of geom_mark_ellipse()

ggplot(iris, aes(Petal.Length, Petal.Width)) +
  geom_mark_ellipse(aes(fill = Species)) +
  geom_point()

If you simply want to show the area where different classes appear, we’re pretty
much done now, as the shapes along with the legend tells the story. But I
promised you some more: textual annotation. So how does this fit into it all?

In addition to the standard aesthetics for shapes, the mark geoms also take a
label and description aesthetic. When used, things get interesting:

ggplot(iris, aes(Petal.Length, Petal.Width)) +
  geom_mark_ellipse(aes(fill = Species, label = Species)) +
  geom_point()

The text is placed automatically so that it does not overlap with any data used
in the layer, and it responds once again to resizing, always trying to find the
most optimal placement of the text. If it is not possible to place the desired
text it elects to not show it at all.

Anyway, in the plot above we have an overabundance of annotation. Both the
legend and the labels. Further, we often want to add annotations to specific
data in the plot, not all of it. We can put focus on setosa by ignoring the
other groups:

desc <- 'This iris species has a markedly smaller petal than the others.'
ggplot(iris, aes(Petal.Length, Petal.Width)) +
  geom_mark_ellipse(aes(filter = Species == 'setosa', label = 'Setosa', 
                        description = desc)) +
  geom_point()

We are using another one of the mark geom family’s tricks here, which is the
filter aesthetic. It makes it quick to specify the data you want to annotate,
but in addition the remaining data is remembered so that any annotation doesn’t
overlap with it even if it is not getting annotated (you wouldn’t get this if
you pre-filtered the data for the layer). Another thing that happens behind the
lines is that the description text automatically gets word wrapping, based on
a desired width of the text-box (defaults to 5 cm).

The mark geoms offer a wide range of possibilities for styling the annotation,
too many to go into detail with here, but rest assured that you have full
control over text appearance, background, line, distance between data and
text-box etc.

Lost in Tessellation

The last of the big additions in this release is a range of geoms for creating
and plotting Delaunay triangulation and Voronoi tessellation. How often do you
need that, you ask? Maybe never… Does it look wicked cool? Why, yes!

Delaunay triangulation is a way to connect points to their nearest neighbors
without any connections overlapping. By nature, this results in triangles being
created. This data can either be thought of as a set of triangles, or a set of
line segments, and ggforce provides both through the geom_delaunay_tile() and
geom_delaunay_segment() geoms. Further, a geom_delaunay_segment2() version
exists that mimics geom_link2 in allowing aesthetic interpolation between
endpoints.

As we are already quite acquainted with the Iris dataset, let’s take it for a
whirl again:

ggplot(iris, aes(Sepal.Length, Sepal.Width)) +
  geom_delaunay_tile(alpha = 0.3) + 
  geom_delaunay_segment2(aes(colour = Species, group = -1), size = 2,
                         lineend = 'round')

The triangulation is not calculated at draw time and is thus susceptible to
range differences on the x and y axes. To combat this it is possible to
normalize the position data before calculating the triangulation.

Voronoi tessellation is sort of an inverse of Delaunay triangulation. it draws
perpendicular segments in the middle of all the triangulation segments and
connects the neighboring ones. The end result is a tile around each point
marking the area where the point is the closest one. In parallel to the
triangulation, Voronoi also comes with both a tile and a segment version.

ggplot(iris, aes(Sepal.Length, Sepal.Width)) + 
  geom_voronoi_tile(aes(fill = Species, group = -1L)) + 
  geom_voronoi_segment() +
  geom_point()

We need to set the group aesthetic to a scalar in order to force all points to
be part of the same tessellation. Otherwise each group would get its own:

ggplot(iris, aes(Sepal.Length, Sepal.Width)) + 
  geom_voronoi_tile(aes(fill = Species), colour = 'black')

Let’s quickly move on from that…

As a Voronoi tessellation can in theory expand forever, we need to define a
bounding box. The default is to expand an enclosing rectangle 10% to each side,
but you can supply your own rectangle, or even an arbitrary polygon. Further,
it is possible to set a radius bound for each point instead:

ggplot(iris, aes(Sepal.Length, Sepal.Width)) + 
  geom_voronoi_tile(aes(fill = Species, group = -1L), max.radius = 0.2,
                    colour = 'black')

This functionality is only available for the tile geom, not the segment, but
this will hopefully change with a later release.

A last point, just to beat a dead horse, is that the tile geoms of course
inherits from geom_shape() so if you like them rounded corners you can have it
your way:

ggplot(iris, aes(Sepal.Length, Sepal.Width)) + 
  geom_voronoi_tile(aes(fill = Species, group = -1L), max.radius = 1,
                    colour = 'black', expand = unit(-0.5, 'mm'), 
                    radius = unit(0.5, 'mm'), show.legend = FALSE)

Zoom

Not a completely new feature as the ones above, but facet_zoom() has gained
enough new power to warrant a mention. The gist of the facet is that it allows
you to zoom in on an area of the plot while keeping the original view as a
separate panel. The old version only allowed specifying the zoom region by
providing a logical expression that indicated what data should be part of the
zoom, but it now has a dedicated xlim and ylim arguments to set them
directly.

ggplot(diamonds) + 
  geom_histogram(aes(x = price), bins = 50) + 
  facet_zoom(xlim = c(3000, 5000), ylim = c(0, 2500), horizontal = FALSE)

The example above shows a shortcoming in simply zooming in on a plot. Sometimes
the resolution (here, bins) aren’t really meaningful for zooming. Because of
this, facet_zoom() has gotten a zoom.data argument to indicate what data to
put on the zoom panel and what to put on the overview panel (and what to put in
both places). It takes a logical expression to evaluate on the data and if it
returns TRUE the data is put in the zoom panel, if it returns FALSE it is
put on the overview panel, and if it returns NA it is put in both. To improve
the visualization above, well add two layers with different number of bins and
use zoom.data to put them in the right place:

ggplot() + 
  geom_histogram(aes(x = price), dplyr::mutate(diamonds, z = FALSE), bins = 50) + 
  geom_histogram(aes(x = price), dplyr::mutate(diamonds, z = TRUE), bins = 500) + 
  facet_zoom(xlim = c(3000, 5000), ylim = c(0, 300), zoom.data = z,
             horizontal = FALSE) + 
  theme(zoom.y = element_blank(), validate = FALSE)

The last flourish we did above was to remove the zoom indicator for the y axis
zoom by using the zoom.y theme element. We currently need to turn off
validation for this to work as ggplot2 by default doesn’t allow unknown theme
elements.

All the rest

The above is just the most worthwhile, but the release also includes a slew of
other features and improvements. Notable mentions are

  • geom_sina() rewrite to allow dodging and follow the shape of geom_violin()
  • position_jitternormal() that jitters points based on a normal distribution
    instead of a uniform one
  • facet_stereo() to allow for faux 3D plots

See the NEWS.md file for
the full list.

Further, ggforce now has a website at https://ggforce.data-imaginist.com, with
full documentation overview etc. This is something I plan to roll out to all my
major packages during the next release cycle. I’ve found that it gives a great
impediment to improving the examples in the documentation!

I do hope that it won’t take another two years before ggforce sees the next big
update. It is certainly a burden of my shoulder to get this out of the door and
I hope I can adhere to smaller, more frequent, releases in the future.

Now go get plotting!

To leave a comment for the author, please follow the link and comment on their blog: Data Imaginist.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Recent popular posts

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)