# Roll Your Own Stats and Geoms in ggplot2 (Part 1: Splines!)

September 8, 2015
By

(This article was first published on rud.is » R, and kindly contributed to R-bloggers)

A huge change is coming to `ggplot2` and you can get a preview of it over at Hadley’s github repo. I’ve been keenly interested in this as I will be fixing, finishing & porting coord_proj to it once it’s done.

Hadley & Winston have re-built the ggplot2 with an entirely new object-oriented system called `ggproto`. With `ggproto` it’s now possible to easily extend ggplot2 from within your own packages (since `source()` is so last century), often times with very little effort.

Before attempting to port `coord_proj` I wanted to work through adding a `Geom` and `Stat` since thought it would be cool to be able to have interpolated line charts (and it helps answer some recurring StackOverflow “spline”/ggplot2 questions) and also prefer `KernSmooth::bkde` over the built-in `density` function (which `geom_density` and `stat_density` both use).

To that end, I’ve made a new github-installable package called ggalt (h/t to @jayjacobs for the better package name than I originally came up with) where I’ll be adding new `Geom`s, `Stat`s, `Coord`s (et al) as I craft them. For now, let me introduce both `geom_xspline()` and `geom_bkde()` to show how easy it is to incorporate new functionality into ggplot2.

While not a requirement, I think it’s a going to be a good idea to make both a paired `Geom` and `Stat` when adding those types of functionality to ggplot2. I found it easier to work with custom parameters this way and it also makes it feel a bit more like the way ggplot2 itself works. For the interpolated line geom/stat I used R’s `graphics::xpsline` function. Here’s all it took to give ggplot2 lines some curves (you can find the commented version on github):

``` geom_xspline <- function(mapping = NULL, data = NULL, stat = "xspline", position = "identity", show.legend = NA, inherit.aes = TRUE, na.rm = TRUE, spline_shape=-0.25, open=TRUE, rep_ends=TRUE, ...) { layer( geom = GeomXspline, mapping = mapping, data = data, stat = stat, position = position, show.legend = show.legend, inherit.aes = inherit.aes, params = list(spline_shape=spline_shape, open=open, rep_ends=rep_ends, ...) ) }   GeomXspline <- ggproto("GeomXspline", GeomLine, required_aes = c("x", "y"), default_aes = aes(colour = "black", size = 0.5, linetype = 1, alpha = NA) )   stat_xspline <- function(mapping = NULL, data = NULL, geom = "line", position = "identity", show.legend = NA, inherit.aes = TRUE, spline_shape=-0.25, open=TRUE, rep_ends=TRUE, ...) { layer( stat = StatXspline, data = data, mapping = mapping, geom = geom, position = position, show.legend = show.legend, inherit.aes = inherit.aes, params = list(spline_shape=spline_shape, open=open, rep_ends=rep_ends, ... ) ) }   StatXspline <- ggproto("StatXspline", Stat,   required_aes = c("x", "y"),   compute_group = function(self, data, scales, params, spline_shape=-0.25, open=TRUE, rep_ends=TRUE) { tf <- tempfile(fileext=".png") png(tf) plot.new() tmp <- xspline(data\$x, data\$y, spline_shape, open, rep_ends, draw=FALSE, NA, NA) invisible(dev.off()) unlink(tf)   data.frame(x=tmp\$x, y=tmp\$y) } ) ```

If that seems like alot of code, it really isn’t. What we have there are:

• two functions that handle the `Geom` aspects &
• two functions that handle the `Stat` aspects.

Let’s look at the `Stat` functions first, though you can also just read the handy vignette, too.

### Adding `Stat`s

In this particular case, we have it easy. We get to use `geom_line`/`GeomLine` as the base `geom_` for the layer since all we’re doing is generating more points for it to draw line segments between. We create the creative interface to our new `Stat` with `stat_xspline` add three new parameters with default values:

• `spline_shape`
• `open`
• `rep_ends`

“Added three new parameters to what?” you ask? `GeomLine`/`geom_line` default to `StatIdentity`/`stat_identity` and if you look at the source code, that `Stat` just returns the data back in the form it came in. We’re going to take these three new parameters and pass them to `xspline` and then return entirely new values back for `ggplot2`/`grid` to draw for us, so we tell it to call our new computation engine by giving it the `StatXspline` value to the layer. By using `GeomLine`/`geom_line` as the `geom` parameter, all we have to do is ensure we pass back the proper values. We do that in `compute_group` since `ggplot2` will segment the incoming data into groups (via the `group` aesthetic) for us. We take each group and run them through the `xspline` with the parameters the user specified. If I didn’t have to use the hack to work around what seems to be errant plot device issues in `xspline`, the call would be one line.

### Adding `Geom`s

We pair up the `Stat` with a very basic `Geom` “shim” so we can use them interchangeably. It’s the same idiom, an “object” function and the user-callable function. In this case, it’s super-lightweight since we’re really having `geom_line` do all the work for us. In a [very] future post, I’ll cover more complex `Geom`s that require use of the underlying `grid` graphics system, but I suspect most of your own additions may be able to use the lightweight idiom here (and that’s covered in the vignette).

### Putting Our New Functions To Work

With our new additions to `ggplot2`, we can compare the output of `geom_smooth` to `geom_xspline` with some test data:

``` set.seed(1492) dat <- data.frame(x=c(1:10, 1:10, 1:10), y=c(sample(15:30, 10), 2*sample(15:30, 10), 3*sample(15:30, 10)), group=factor(c(rep(1, 10), rep(2, 10), rep(3, 10))) )   ggplot(dat, aes(x, y, group=group, color=factor(group))) + geom_point(color="black") + geom_smooth(se=FALSE, linetype="dashed", size=0.5) + geom_xspline(size=0.5) ```

The github page has more examples for the function, but you don’t have to be envious of the smooth D3 curves any more.

I realize this particular addition is not extremely helpful/beneficial, but the next one is. We’ll look at adding a new/more accurate density `Stat`/`Geom` in the next installment and then discuss the “on-steroids” roxygen2 comments you’ll end up using for your creations in part 3.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...