# Four Filters for Functional (Programming) Friends

**rstats on Irregularly Scheduled Programming**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I’m part of a local Functional Programming Meetup group which hosts talks, but also coordinates social meetings where we discuss all sorts of FP-related topics including Haskell and other languages. We’ve started running challenges where we all solve a given problem in a language of our choosing then discuss over drinks how they compare.

This month we went with an “easy” problem with a wrinkle – we would solve the ‘strain’ exercise from Exercism (Haskell, Python – your access to these is likely conditional on you being enrolled in that language track) with an extension:

The problem is trivial; the challenge is to solve it in 4 different ways using your language of choice.

The problem itself is given as

Implement the

`keep`

and`discard`

operation on collections. Given a collection and a predicate on the collection’s elements,`keep`

returns a new collection containing those elements where the predicate is true, while`discard`

returns a new collection containing those elements where the predicate is false.For example, given the collection of numbers:

`1, 2, 3, 4, 5`

And the predicate:

“is the number even?”

Then your keep operation should produce:

`2, 4`

While your discard operation should produce:

`1, 3, 5`

but with a restriction:

Keep your hands off that filter/reject/whatchamacallit functionality provided by your standard library! Solve this one yourself using other basic tools instead.

I figured it’s a good opportunity to write as I solve it, so here’s my R solutions.

I’ll define a test case so I can try out things as I go

test_vec <- c(1, 2, 3, 4, 5) test_vec ## [1] 1 2 3 4 5

and the predicates related to ‘even’ and ‘odd’ as functions which return `TRUE`

or `FALSE`

is_even <- function(x) { x %% 2 == 0 } is_even(7) ## [1] FALSE is_even(8) ## [1] TRUE is_odd <- function(x) { !is_even(x) } is_odd(7) ## [1] TRUE is_odd(8) ## [1] FALSE

Firstly, the restriction doesn’t seem to worry me because when I think of “filter” in R
I immediately think of `dplyr::filter()`

which works on `data.frame`

(or `tibble`

)
objects, and (given the examples) we’re aiming to work with vectors (the problem is stated
the same in several languages, so “collection” is a generalisation).

What about `base::Filter()`

? The help states

`Filter`

extracts the elements of a vector for which a predicate (logical) function gives true.

Filter(is_even, test_vec) ## [1] 2 4 Filter(is_odd, test_vec) ## [1] 1 3 5

Yep, that works exactly as I hoped, but is also a built-in “filter” so I can’t use it.

When I think of `keep`

and `discard`

I do think of the `purrr`

functions, and while these, too
do exactly what I want

purrr::keep(test_vec, is_even) ## [1] 2 4 purrr::discard(test_vec, is_even) ## [1] 1 3 5

They’re in a library, so I’m going to say they don’t count.

One of the things I like about the way R does subsetting (via the square-bracket `[`

which *is by itself* a function, but requires a matching `]`

to satisfy the parser) is that you
can use a logical vector to subset another vector,

c(3, 5, 8, 12)[c(TRUE, FALSE, FALSE, TRUE)] ## [1] 3 12

which means that if I can *produce* such a logical vector, say, by applying a predicate function, I can do subsetting that way

test_vec[is_even(test_vec)] ## [1] 2 4 test_vec[is_odd(test_vec)] ## [1] 1 3 5

Instead of using `is_odd()`

I can just *negate* the logical vector to get the same effect

test_vec[!is_even(test_vec)] ## [1] 1 3 5

I can make those into functions that take a predicate and a vector

keep_1 <- function(f, x) { x[f(x)] } discard_1 <- function(f, x) { x[!f(x)] }

Testing these

keep_1(is_even, test_vec) ## [1] 2 4 discard_1(is_even, test_vec) ## [1] 1 3 5

One down!

One thing to note with this approach is that R is *vectorised* - I’ve discussed this
a few times on this blog (most recently) - which
means that these predicate functions will gladly take a vector, not just a single value. This
works for the `is_even()`

function because inside that, the modulo operator `%%`

is itself
vectorised, so

is_even(c(2, 4, 6, 9, 11, 13)) ## [1] TRUE TRUE TRUE FALSE FALSE FALSE

As I wrote in my previous post, thinking like this just becomes so natural in R that I have to force myself to remember that not every language does that.

It’s also worth mentioning that I’m passing a reference to the function `is_even`

to our
`keep`

and `discard`

functions - that is serving as our predicate because I need a way to
state “is the number even?” which references the number, so I need a function. That *doesn’t*
have to be a named function, though - it could be an “anonymous” function (a “lambda”) if
I wanted

keep_1(function(z) z %% 2 == 0, test_vec) ## [1] 2 4 discard_1(function(z) z %% 2 == 0, test_vec) ## [1] 1 3 5

I can subset a vector with a logical vector of the same length, specifying whether or not
to include that element, but I can *also* subset by position (keeping in mind that R is a
1-based language which means the first element is indexed by a `1`

- why would any language
do anything different? 😜)

c(3, 5, 8, 12)[c(1, 4)] ## [1] 3 12

The function `which()`

takes a logical vector and returns which indices are `TRUE`

which(c(TRUE, FALSE, FALSE, TRUE)) ## [1] 1 4

so I can use this with our predicate to keep elements

keep_2 <- function(f, x) { x[which(f(x))] } keep_2(is_even, test_vec) ## [1] 2 4

However, discarding elements by index doesn’t use a logical negation, it uses a
negative sign (`-`

)

discard_2 <- function(f, x) { x[-which(f(x))] } discard_2(is_even, test_vec) ## [1] 1 3 5

If you look at the source of `Filter()`

, you’ll see that I wasn’t far off of exactly
that

Filter ## function (f, x) ## { ## ind <- as.logical(unlist(lapply(x, f))) ## x[which(ind)] ## } ## <bytecode: 0x55da1b321ad8> ## <environment: namespace:base>

but it still counts.

Another option would be to unpack the elements themselves and do some stepwise
comparisons in a loop. For each element `el`

in the vector `x`

, test if `f(el)`

is `TRUE`

,
and if it is, concatenate `el`

to the end of the accumulating `result`

vector

keep_3 <- function(f, x) { result <- c() for (el in x) { if (f(el)) { result <- c(result, el) } } result } keep_3(is_even, test_vec) ## [1] 2 4 discard_3 <- function(f, x) { result <- c() for (el in x) { if (!f(el)) { result <- c(result, el) } } result } discard_3(is_even, test_vec) ## [1] 1 3 5

Of course, this approach is a Bad Idea™ in general but I’m not optimising anything here. This
approach *does* have the advantage that it isn’t relying on R’s vectorised capabilities, since
each element is passed to the predicate function individually, so if I did have a non-vectorized
predicate function, this would still work.

I really want a “weird” way to do this. R has plenty of weird to go around, but since I’ve been learning some Haskell, and the challenge originally referenced the Haskell solution, what if I code a Haskell-esque solution?

Haskell makes good use of recursive functions. Any loop can be written as a recursion
(and vice-versa) so the previous solution is a good starting point. First, I define
a base case; if I run out of numbers to process, return `NULL`

. A convenient feature
of R vectors is that `NULL`

s are dropped

c(1, 2, NULL, 3, 4, NULL, 5) ## [1] 1 2 3 4 5

Otherwise, I can take the first value in the vector and test it with the predicate. If it
returns `TRUE`

I can append it to what I’ve calculated so far, and recursively
call the function again with the rest of the vector. That could look like

keep_4 <- function(f, x) { if (!length(x)) return(NULL) if (f(x[1])) { return(c(x[1], Recall(f, x[-1]))) } else { return(Recall(f, x[-1])) } } keep_4(is_even, test_vec) ## [1] 2 4

Some interesting points about this: the `Recall()`

function is nice for defining a recursive
function. I could have used `keep_4`

there, but the advantage of this implementation is
that I can rename the function and it still works as expected

keep_4_also <- keep_4 rm("keep_4") keep_4_also(is_even, test_vec) ## [1] 2 4

If I had explicitly referenced `keep_4`

inside itself, that recursion would fail with this
renaming.

The negative subsetting works as described above; `x[-1]`

means “not including the first
element”. Lastly, testing `if (!length(x))`

works because `0`

can be coerced to `FALSE`

and
any other value to `TRUE`

, so if the `length`

of `x`

is *not* `0`

, this condition is met.

The discarding variant is similar, just with the two `returns()`

around the other way, or

discard_4 <- function(f, x) { if (!length(x)) return(NULL) if (!f(x[1])) { return(c(x[1], Recall(f, x[-1]))) } else { return(Recall(f, x[-1])) } } discard_4(is_even, test_vec) ## [1] 1 3 5

There we go; 4 hand-coded implementations of `keep`

and `discard`

in R.

Can you think of another that doesn’t use `Filter()`

or an external library? Let me
know in the comments below or on Mastodon. I’m
looking forward to seeing how people solved this in other languages.

##
`devtools::session_info()`

## ─ Session info ─────────────────────────────────────────────────────────────── ## setting value ## version R version 4.1.2 (2021-11-01) ## os Pop!_OS 22.04 LTS ## system x86_64, linux-gnu ## ui X11 ## language (EN) ## collate en_AU.UTF-8 ## ctype en_AU.UTF-8 ## tz Australia/Adelaide ## date 2023-08-30 ## pandoc 3.1.1 @ /usr/lib/rstudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown) ## ## ─ Packages ─────────────────────────────────────────────────────────────────── ## package * version date (UTC) lib source ## blogdown 1.17 2023-05-16 [1] CRAN (R 4.1.2) ## bookdown 0.29 2022-09-12 [1] CRAN (R 4.1.2) ## bslib 0.5.0 2023-06-09 [3] CRAN (R 4.3.1) ## cachem 1.0.8 2023-05-01 [3] CRAN (R 4.3.0) ## callr 3.7.3 2022-11-02 [3] CRAN (R 4.2.2) ## cli 3.6.1 2023-03-23 [3] CRAN (R 4.2.3) ## crayon 1.5.2 2022-09-29 [3] CRAN (R 4.2.1) ## devtools 2.4.5 2022-10-11 [1] CRAN (R 4.1.2) ## digest 0.6.33 2023-07-07 [3] CRAN (R 4.3.1) ## ellipsis 0.3.2 2021-04-29 [3] CRAN (R 4.1.1) ## evaluate 0.21 2023-05-05 [3] CRAN (R 4.3.0) ## fastmap 1.1.1 2023-02-24 [3] CRAN (R 4.2.2) ## fs 1.6.3 2023-07-20 [3] CRAN (R 4.3.1) ## glue 1.6.2 2022-02-24 [3] CRAN (R 4.2.0) ## htmltools 0.5.6 2023-08-10 [3] CRAN (R 4.3.1) ## htmlwidgets 1.5.4 2021-09-08 [1] CRAN (R 4.1.2) ## httpuv 1.6.6 2022-09-08 [1] CRAN (R 4.1.2) ## jquerylib 0.1.4 2021-04-26 [3] CRAN (R 4.1.2) ## jsonlite 1.8.7 2023-06-29 [3] CRAN (R 4.3.1) ## knitr 1.43 2023-05-25 [3] CRAN (R 4.3.0) ## later 1.3.0 2021-08-18 [1] CRAN (R 4.1.2) ## lifecycle 1.0.3 2022-10-07 [3] CRAN (R 4.2.1) ## magrittr 2.0.3 2022-03-30 [3] CRAN (R 4.2.0) ## memoise 2.0.1 2021-11-26 [3] CRAN (R 4.2.0) ## mime 0.12 2021-09-28 [3] CRAN (R 4.2.0) ## miniUI 0.1.1.1 2018-05-18 [1] CRAN (R 4.1.2) ## pkgbuild 1.4.0 2022-11-27 [1] CRAN (R 4.1.2) ## pkgload 1.3.0 2022-06-27 [1] CRAN (R 4.1.2) ## prettyunits 1.1.1 2020-01-24 [3] CRAN (R 4.0.1) ## processx 3.8.2 2023-06-30 [3] CRAN (R 4.3.1) ## profvis 0.3.7 2020-11-02 [1] CRAN (R 4.1.2) ## promises 1.2.0.1 2021-02-11 [1] CRAN (R 4.1.2) ## ps 1.7.5 2023-04-18 [3] CRAN (R 4.3.0) ## purrr 1.0.1 2023-01-10 [1] CRAN (R 4.1.2) ## R6 2.5.1 2021-08-19 [3] CRAN (R 4.2.0) ## Rcpp 1.0.9 2022-07-08 [1] CRAN (R 4.1.2) ## remotes 2.4.2 2021-11-30 [1] CRAN (R 4.1.2) ## rlang 1.1.1 2023-04-28 [1] CRAN (R 4.1.2) ## rmarkdown 2.23 2023-07-01 [3] CRAN (R 4.3.1) ## rstudioapi 0.15.0 2023-07-07 [3] CRAN (R 4.3.1) ## sass 0.4.7 2023-07-15 [3] CRAN (R 4.3.1) ## sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.1.2) ## shiny 1.7.2 2022-07-19 [1] CRAN (R 4.1.2) ## stringi 1.7.12 2023-01-11 [3] CRAN (R 4.2.2) ## stringr 1.5.0 2022-12-02 [1] CRAN (R 4.1.2) ## urlchecker 1.0.1 2021-11-30 [1] CRAN (R 4.1.2) ## usethis 2.1.6 2022-05-25 [1] CRAN (R 4.1.2) ## vctrs 0.6.3 2023-06-14 [1] CRAN (R 4.1.2) ## xfun 0.40 2023-08-09 [3] CRAN (R 4.3.1) ## xtable 1.8-4 2019-04-21 [1] CRAN (R 4.1.2) ## yaml 2.3.7 2023-01-23 [3] CRAN (R 4.2.2) ## ## [1] /home/jono/R/x86_64-pc-linux-gnu-library/4.1 ## [2] /usr/local/lib/R/site-library ## [3] /usr/lib/R/site-library ## [4] /usr/lib/R/library ## ## ──────────────────────────────────────────────────────────────────────────────

**leave a comment**for the author, please follow the link and comment on their blog:

**rstats on Irregularly Scheduled Programming**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.