Higher Order Functions in R

September 23, 2010
By

(This article was first published on John Myles White » Statistics, and kindly contributed to R-bloggers)

Introduction

Because R is, in part, a functional programming language, the ‘base’ package contains several higher order functions. By higher order functions, I mean functions that take another function as an argument and then do something with that function. If you want to know more about the usefulness of writing higher order functions in general, I’d recommend the classic Structure and Interpretation of Computer Programs and the more recent Higher Order Perl, both of which are freely available online.

The six primary higher order functions built into R are

  1. Reduce
  2. Filter
  3. Map
  4. Find
  5. Position
  6. Negate

I’ll go through these in order, giving some examples using basic number theoretic calculations in something approximating the style of SICP. Hopefully the use of mathematical examples isn’t as unnerving for stats-savvy readers as it can be for many readers of SICP.

Reduce

Reduce loops over pairs of elements of a vector, performing the same binary operation on all of the pairs along the way. That’s so abstract that it’s probably unclear. For that reason, I think it’s best to work up from a very specific example to the general notion of Reduce.

Let’s start by assuming that you’ve been writing some horrible piece of code like the following:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
my.sum <- function(x)
{
	if(length(x) == 1)
	{
		return(x[1])
	}
	if (length(x) == 2)
	{
		return(x[1] + x[2])
	}
	if (length(x) == 3)
	{
		return(x[1] + x[2] + x[3])
	}
}

This approach is clearly so specific that it’s virtually unusable. The most obvious fault is that our code is unable to deal with arbitrarily long vectors. That can be easily fixed, though, with a simple loop:

1
2
3
4
5
6
7
8
9
10
11
my.sum <- function(x)
{
	result <- 0
 
	for (i in 1:length(x))
	{
		result <- result + x[i]
	}
 
	return(result)
}

Reduce takes another abstraction step beyond the one we just made. Instead of making specific functions that repeatedly perform some set of pre-specified binary operations, Reduce provides one higher order generalization that takes an arbitrary binary operation as an argument. This lets us rewrite my.sum in one line:

1
my.sum <- function (x) {Reduce(`+`, x)}

If you’re confused by that line, there are two potential sources of magic. First, the name of the addition operation in R is `+` with backquotes. Second, Reduce and its relatives assume that the first argument is a function that they will use on the elements of x.

Once you realize that Reduce makes this sort of code trivial to write, it’s easy to continue exploiting this approach:

1
2
3
my.prod <- function (x) {Reduce(`*`, x)}
 
my.factorial <- function (n) {Reduce(`*`, 1:n)}

Better yet, if you know that Reduce has other options like accumulate, which allows you to keep track of the interim results of the Reduce operations, then you can also make replacements for cumsum and cumprod easily:

1
2
3
my.cumsum <- function (x) {Reduce(`+`, x, accumulate = TRUE)}
 
my.cumprod <- function (x) {Reduce(`*`, x, accumulate = TRUE)}

Filter

Filter is even easier to understand than Reduce: it simply selects the elements of a vector that meet a predicate you pass in as a function:

1
2
small.even.numbers <- Filter(function (x) {x %% 2 == 0}, 1:10)
small.odd.numbers <- Filter(function (x) {x %% 2 == 1}, 1:10)

We can use Filter along with some more interesting predicates to write much more complex functions quickly:

1
2
3
is.divisor <- function(a, x) {x %% a == 0}
 
is.prime <- function (x) {length(Filter(function (a) {is.divisor(a, x)}, 1:x)) == 2}

And we can even combine Reduce and Filter to perform otherwise tedious tasks like enumerating the perfect numbers:

1
2
3
4
5
proper.divisors <- function (x) {Filter(function (a) {is.divisor(a, x)}, 1:(x - 1))}
 
is.perfect <- function(x) {x == Reduce(`+`, proper.divisors(x))}
 
small.perfect.numbers <- Filter(is.perfect, 1:1000)

Map

Map is essentially the same abstraction as the apply family of functions provides. Try the following to see why I say that:

1
Map(proper.divisors, 1:5)

Find

Find is really a truncated form of Filter: it locates the first item in a vector that satisfies a predicate. For example, we can find the first prime greater than 1000 like so:

1
Find(is.prime, 1000:2000)

Position

Position is a variant of Find that provides the index of the element that would be returned by Find instead of its value:

1
Position(is.prime, 1000:2000)

Negate

Negate simply flips the Boolean sense of an existing predicate. So we can do the following:

1
is.composite <- Negate(is.prime)

Using Reduce for Continued Fraction Computations

The following example uses higher order functions to compute the value of a finite continued fraction. The code is lifted straight from the help docs for Reduce. If you’re not familiar with continued fractions, I’d recommend starting with this Wikipedia article and then reading The Higher Arithmetic (Incidentally, The Higher Arithmetic is the most enjoyable book on mathematics I have ever read.)

1
2
cfrac <- function(x) Reduce(function(u, v) u + 1 / v, x, right = TRUE)
cfrac(c(1, rep(2, 99))) == sqrt(2)

This example also shows that you can have Reduce work right to left through a vector when you might need to. In this example, you do need to work in that specific order, because the binary operation is not associative, so moving left to right would produce different results.

How Fast are These Higher Order Functions?

Given their flexibility, one might wonder if these higher order functions lose something in efficiency. Some basic profiling suggests they do, though I think their elegance makes up for small efficiency losses in many contexts.

For example, filtering and selecting elements of a vector are essentially identical, so we can compare these two approaches:

1
2
3
4
5
natural.numbers <- 1:10000
iterations <- 10
 
mean(replicate(iterations, system.time(natural.numbers[natural.numbers %% 2 == 0])))
mean(replicate(iterations, system.time(Filter(function (n) {n %% 2 == 0}, natural.numbers))))

Similarly, we can compare Map and lapply:

1
2
mean(replicate(iterations, system.time(lapply(1:10000, sqrt))))
mean(replicate(iterations, system.time(Map(sqrt, 1:10000))))

And finally we can compare our Reduce implementation of sum with the hardcoded one:

1
2
mean(replicate(iterations, system.time(sum(1:10000))))
mean(replicate(iterations, system.time(Reduce(`+`, 1:10000))))

To leave a comment for the author, please follow the link and comment on his blog: John Myles White » Statistics.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: ,

Comments are closed.