# Roll Your Own Stats and Geoms in ggplot2 (Part 1: Splines!)

**rud.is » R**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

A *huge* change is coming to `ggplot2`

and you can get a preview of it over at Hadley’s github repo. I’ve been keenly interested in this as I will be fixing, finishing & porting coord_proj to it once it’s done.

Hadley & Winston have re-built the ggplot2 with an entirely new object-oriented system called `ggproto`

. With `ggproto`

it’s now possible to easily extend ggplot2 from *within your own packages* (since `source()`

is *so* last century), often times with very little effort.

Before attempting to port `coord_proj`

I wanted to work through adding a `Geom`

and `Stat`

since thought it would be cool to be able to have interpolated line charts (and it helps answer some recurring StackOverflow “spline”/ggplot2 questions) and also prefer `KernSmooth::bkde`

over the built-in `density`

function (which `geom_density`

and `stat_density`

both use).

To that end, I’ve made a new github-installable package called ggalt (h/t to @jayjacobs for the better package name than I originally came up with) where I’ll be adding new `Geom`

s, `Stat`

s, `Coord`

s (et al) as I craft them. For now, let me introduce both `geom_xspline()`

and `geom_bkde()`

to show how easy it is to incorporate new functionality into ggplot2.

While not a requirement, I think it’s a going to be a good idea to make both a paired `Geom`

and `Stat`

when adding those types of functionality to ggplot2. I found it easier to work with custom parameters this way and it also makes it *feel* a bit more like the way ggplot2 itself works. For the interpolated line geom/stat I used R’s `graphics::xpsline`

function. Here’s all it took to give ggplot2 lines some curves (you can find the commented version on github):

geom_xspline <- function(mapping = NULL, data = NULL, stat = "xspline", position = "identity", show.legend = NA, inherit.aes = TRUE, na.rm = TRUE, spline_shape=-0.25, open=TRUE, rep_ends=TRUE, ...) { layer( geom = GeomXspline, mapping = mapping, data = data, stat = stat, position = position, show.legend = show.legend, inherit.aes = inherit.aes, params = list(spline_shape=spline_shape, open=open, rep_ends=rep_ends, ...) ) } GeomXspline <- ggproto("GeomXspline", GeomLine, required_aes = c("x", "y"), default_aes = aes(colour = "black", size = 0.5, linetype = 1, alpha = NA) ) stat_xspline <- function(mapping = NULL, data = NULL, geom = "line", position = "identity", show.legend = NA, inherit.aes = TRUE, spline_shape=-0.25, open=TRUE, rep_ends=TRUE, ...) { layer( stat = StatXspline, data = data, mapping = mapping, geom = geom, position = position, show.legend = show.legend, inherit.aes = inherit.aes, params = list(spline_shape=spline_shape, open=open, rep_ends=rep_ends, ... ) ) } StatXspline <- ggproto("StatXspline", Stat, required_aes = c("x", "y"), compute_group = function(self, data, scales, params, spline_shape=-0.25, open=TRUE, rep_ends=TRUE) { tf <- tempfile(fileext=".png") png(tf) plot.new() tmp <- xspline(data$x, data$y, spline_shape, open, rep_ends, draw=FALSE, NA, NA) invisible(dev.off()) unlink(tf) data.frame(x=tmp$x, y=tmp$y) } ) |

If that seems like alot of code, it really isn’t. What we have there are:

- two functions that handle the
`Geom`

aspects & - two functions that handle the
`Stat`

aspects.

Let’s look at the `Stat`

functions first, though you can also just read the handy vignette, too.

### Adding `Stat`

s

In this particular case, we have it easy. We get to use `geom_line`

/`GeomLine`

as the base `geom_`

for the layer since all we’re doing is generating more points for it to draw line segments between. We create the creative interface to our new `Stat`

with `stat_xspline`

add three new parameters with default values:

`spline_shape`

`open`

`rep_ends`

*“Added three new parameters to what?”* you ask? `GeomLine`

/`geom_line`

default to `StatIdentity`

/`stat_identity`

and if you look at the source code, that `Stat`

just returns the data back in the form it came in. We’re going to take these three new parameters and pass them to `xspline`

and then return entirely new values back for `ggplot2`

/`grid`

to draw for us, so we tell it to call our new computation engine by giving it the `StatXspline`

value to the layer. By using `GeomLine`

/`geom_line`

as the `geom`

parameter, all we have to do is ensure we pass back the proper values. We do that in `compute_group`

since `ggplot2`

will segment the incoming data into groups (via the `group`

aesthetic) for us. We take each group and run them through the `xspline`

with the parameters the user specified. If I didn’t have to use the hack to work around what seems to be errant plot device issues in `xspline`

, the call would be one line.

### Adding `Geom`

s

We pair up the `Stat`

with a very basic `Geom`

“shim” so we can use them interchangeably. It’s the same idiom, an “object” function and the user-callable function. In this case, it’s super-lightweight since we’re really having `geom_line`

do all the work for us. In a [very] future post, I’ll cover more complex `Geom`

s that require use of the underlying `grid`

graphics system, but I suspect most of your own additions may be able to use the lightweight idiom here (and that’s covered in the vignette).

### Putting Our New Functions To Work

With our new additions to `ggplot2`

, we can compare the output of `geom_smooth`

to `geom_xspline`

with some test data:

set.seed(1492) dat <- data.frame(x=c(1:10, 1:10, 1:10), y=c(sample(15:30, 10), 2*sample(15:30, 10), 3*sample(15:30, 10)), group=factor(c(rep(1, 10), rep(2, 10), rep(3, 10))) ) ggplot(dat, aes(x, y, group=group, color=factor(group))) + geom_point(color="black") + geom_smooth(se=FALSE, linetype="dashed", size=0.5) + geom_xspline(size=0.5) |

The github page has more examples for the function, but you don’t have to be envious of the smooth D3 curves any more.

I realize this particular addition is not extremely helpful/beneficial, but the next one is. We’ll look at adding a new/more accurate density `Stat`

/`Geom`

in the next installment and then discuss the “on-steroids” roxygen2 comments you’ll end up using for your creations in part 3.

**leave a comment**for the author, please follow the link and comment on their blog:

**rud.is » R**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.