# R Tip: Break up Function Nesting for Legibility

March 21, 2018
By

There are a number of easy ways to avoid illegible code nesting problems in `R`.

In this R tip we will expand upon the above statement with a simple example.

At some point it becomes illegible and undesirable to compose operations by nesting them, such as in the following code.

```   head(mtcars[with(mtcars, cyl == 8), c("mpg", "cyl", "wt")])

#                     mpg cyl   wt
# Hornet Sportabout  18.7   8 3.44
# Duster 360         14.3   8 3.57
# Merc 450SE         16.4   8 4.07
# Merc 450SL         17.3   8 3.73
# Merc 450SLC        15.2   8 3.78
# Cadillac Fleetwood 10.4   8 5.25
```

One popular way to break up nesting is to use `magrittr`‘s “`%>%`” in combination with `dplyr` transform verbs as we show below.

```library("dplyr")

mtcars                 %>%
filter(cyl == 8)     %>%
select(mpg, cyl, wt) %>%

#    mpg cyl   wt
# 1 18.7   8 3.44
# 2 14.3   8 3.57
# 3 16.4   8 4.07
# 4 17.3   8 3.73
# 5 15.2   8 3.78
# 6 10.4   8 5.25
```

Note: the above code lost (without warning) the row names that are part of `mtcars`. We also pass over the details of how pipe notation works. It is sufficient to say the notational convention is: each stage is approximately treated as an altered function call with a new inserted first argument set to the value of the pipeline up to the current point.

Many `R` users already routinely avoid nested notation problems through a convention I call “name re-use.” Such code looks like the following.

```result <- mtcars
result <- filter(result, cyl == 8)
result <- select(result, mpg, cyl, wt)
```

The above convention is enough to get around all problems of nesting. It also has the great advantage that it is step-debuggable.

I like a variation I call “dot intermediates”, which looks like the code below (notice we are switching back from `dplyr` verbs, to base `R` operators).

```. <- mtcars
. <- subset(., cyl == 8)
. <- .[, c("mpg", "cyl", "wt")]
result <- .

#                     mpg cyl   wt
# Hornet Sportabout  18.7   8 3.44
# Duster 360         14.3   8 3.57
# Merc 450SE         16.4   8 4.07
# Merc 450SL         17.3   8 3.73
# Merc 450SLC        15.2   8 3.78
# Cadillac Fleetwood 10.4   8 5.25
```

The dot intermediate convention is very succinct, and we can use it with base `R` transforms to get a correct (and performant) result. Like all conventions: it is just a matter of teaching, learning, and repetition to make this seem natural, familiar and legible.

```library("dplyr")
library("microbenchmark")
library("ggplot2")

timings <- microbenchmark(
base = {
. <- mtcars
. <- subset(., cyl == 8)
. <- .[, c("mpg", "cyl", "wt")]
nrow(.)
},
dplyr = {
mtcars                 %>%
filter(cyl == 8)     %>%
select(mpg, cyl, wt) %>%
nrow
})

print(timings)

## Unit: microseconds
##   expr      min       lq      mean   median       uq       max neval
##   base  122.948  136.948  167.2253  159.688  179.924   349.328   100
##  dplyr 1570.188 1654.700 2537.2912 1699.744 1785.611 50759.770   100

autoplot(timings)
```

Durations for related tasks, smaller is better.

Contrary to what many repeat, base `R` is often faster than the `dplyr` alternative. In this case the base `R` is 15 times faster (possibly due to `magrittr` overhead and the small size of this example). We also see, with some care, base `R` can be quite legible. `dplyr` is a useful tool and convention, however it is not the only allowed tool or only allowed convention.

