Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In scientific programming speed is important. Functions written for general public use have a lot of control-flow checks which are not necessary if you are confident enough with your code.To quicken your code execution I suggest to strip run-of-the-mill functions to their bare bones. You can save serious wall-clock time by using only the laborers code lines. Below is a walk-through example of what I mean.

I use the `quantile` function for the example. There are many ways to compute the estimate of a quantile, and all those various ways are coded into the one `quantile` function. The function has the default argument `type = 7` which indicates the particular way we wish to estimate our quantiles. Given that R is an open-source language you can easily find the code for any function, then you can “fish out” only the lines that you actually need. While the code for the `quantile` function is around 90 lines (given below), the real labor is carried out mainly by lines 49 to 58 – the main workhorse (for the type=7 default).

Now, let’s write our own version of the `quantile` function; call it `lean_quantile`. Then we make sure our `lean_quantile` does what its meant to do, and compare the execution time.

```lean_quantile <- function(x, probs = seq(0, 1, 0.25)) {
n <- length(x)
np <- length(probs)
index <- 1 + (n - 1) * probs
lo <- floor(index)
hi <- ceiling(index)
x <- sort(x, partial = unique(c(lo, hi)))
qs <- x[lo]
i <- which(index > lo)
h <- (index - lo)[i]
qs[i] <- (1 - h) * qs[i] + h * x[hi[i]]
qs
}```

Check that our `lean_quantile` does what its meant to do:

```tmpp <- rnorm(10)
all( quantile(tmpp) == lean_quantile(tmpp) )
 TRUE```

Now we can compare the execution time (more on timing and profiling code):

```library(microbenchmark)
# citation("microbenchmark")
bench <- microbenchmark(quantile(rnorm(10)),
lean_quantile(rnorm(10)),
times=10^4)
bench
# Unit: microseconds
#                     expr  min   lq mean median   uq  max neval
#      quantile(rnorm(10)) 79.1 84.3 96.3   86.1 89.0 3907 10000
# lean_quantile(rnorm(10)) 27.9 31.6 36.2   33.4 34.8 4741 10000```

Execution time is reduced by over 60%. Also, we did not have to work very hard for it. We can do more, diving further and improve the `sort` function which our `lean_quantile` uses, but you get the idea.

Is it a free lunch? Of course not.

It takes long to master efficient programming, and the functions you find in the public domain are probably well scrutinized – before and after they go up there. When you mingle with the internals you risk making a mistake, erasing an important line or creating unintended consequences and messing up the original behavior. So meticulous checks are good to do.

While some functions are written so efficiently that you will find very little value in pulling out just the workhorse, with most functions written for the general public you will certainly be able to squeeze out some time-profit. As you can see this “get the gist” tip has excellent potential to save a lot of waiting time.

```#  Part of the R package, https://www.R-project.org
#
#  Copyright (C) 1995-2014 The R Core Team
#
#  This program is free software; you can redistribute it and/or modify
#  the Free Software Foundation; either version 2 of the License, or
#  (at your option) any later version.
#
#  This program is distributed in the hope that it will be useful,
#  but WITHOUT ANY WARRANTY; without even the implied warranty of
#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#  GNU General Public License for more details.
#
#  A copy of the GNU General Public License is available at

quantile <- function(x, ...) UseMethod("quantile")

quantile.POSIXt <- function(x, ...)
.POSIXct(quantile(unclass(as.POSIXct(x)), ...), attr(x, "tzone"))

quantile.default <-
function(x, probs = seq(0, 1, 0.25), na.rm = FALSE, names = TRUE,
type = 7, ...)
{
if(is.factor(x)) {
if(!is.ordered(x) || ! type %in% c(1L, 3L))
stop("factors are not allowed")
lx <- levels(x)
} else lx <- NULL
if (na.rm)
x <- x[!is.na(x)]
else if (anyNA(x))
stop("missing values and NaN's not allowed if 'na.rm' is FALSE")
eps <- 100*.Machine\$double.eps
if (any((p.ok <- !is.na(probs)) & (probs < -eps | probs > 1+eps)))
stop("'probs' outside [0,1]")
n <- length(x)
if(na.p <- any(!p.ok)) { # set aside NA & NaN
o.pr <- probs
probs <- probs[p.ok]
probs <- pmax(0, pmin(1, probs)) # allow for slight overshoot
}
np <- length(probs)
if (n > 0 && np > 0) {
if(type == 7) { # be completely back-compatible
index <- 1 + (n - 1) * probs
lo <- floor(index)
hi <- ceiling(index)
x <- sort(x, partial = unique(c(lo, hi)))
qs <- x[lo]
i <- which(index > lo)
h <- (index - lo)[i] # > 0	by construction
##	    qs[i] <- qs[i] + .minus(x[hi[i]], x[lo[i]]) * (index[i] - lo[i])
##	    qs[i] <- ifelse(h == 0, qs[i], (1 - h) * qs[i] + h * x[hi[i]])
qs[i] <- (1 - h) * qs[i] + h * x[hi[i]]
} else {
if (type  j),		# type 1
((nppm > j) + 1)/2, # type 2
(nppm != j) | ((j %% 2L) == 1L)) # type 3
} else {
## Types 4 through 9 are continuous sample qs.
switch(type - 3,
{a <- 0; b <- 1},    # type 4
a <- b <- 0.5,   # type 5
a <- b <- 0,     # type 6
a <- b <- 1,     # type 7 (unused here)
a <- b <- 1 / 3, # type 8
a <- b <- 3 / 8) # type 9
## need to watch for rounding errors here
fuzz <- 4 * .Machine\$double.eps
nppm <- a + probs * (n + 1 - a - b) # n*probs + m
j <- floor(nppm + fuzz) # m = a + probs*(1 - a - b)
h <- nppm - j
if(any(sml <- abs(h) < fuzz)) h[sml] <- 0
}
x <- sort(x, partial =
unique(c(1, j[j>0L & j0L & j 0L) {
names(qs) <- format_perc(probs)
}
if(na.p) { # do this more elegantly (?!)
o.pr[p.ok] <- qs
names(o.pr) <- rep("", length(o.pr)) # suppress  names
names(o.pr)[p.ok] <- names(qs)
o.pr
} else qs
}```

### Footnotes

As a side note, would be nice to do that in Python also, but the source code for the numpy quantile function is heavily “decorated”. Comment if you know how to create the Python counterpart.

Coding All-in-One For Dummies  