Maybe monad in R

[This article was first published on R on Biofunctor, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

A monad is mysterious entity from the ivory towers of category theory, an idea that turned out to be quite useful in programming. Part of the myth surrounding monads is that as soon as you understand them, you lose the ability to explain the concept. Since I’m not a mathematician, not even a trained programmer, I won’t even try to explain anything. Instead, I just implement a simple monad.

R, being a functional programming language should be able to benefit from this concept. The goal is not to create a library, just to demonstrate monads, and gain a practical understanding of its usefulness.

So, I decided to write a single-use maybe monad for a theoretical use case. I will use the magrittr pipe operator %>% as a close analogy for function composition, and will similarly create a new bind operator for my monad.

The problem

You are developing a game, which can be played as different characters. Depending on which character you choose, you can have different number of lives. All the necessary information is stored in a database, however, your team of developers is not very organised. The database can be updated any time, column names may change, and character names may go missing.

For simplicity, your “database” is a data frame saved in a CSV file.

# Data --------------------------------------------------------------------
library(tidyverse)

livesleft <- tibble(
    names= c("John", "Ed", "Ned", "Sam", "Benjen", "Beric"),
    nLives=c( 2,      1,    0,     1,     0.5,      4)
)

livesleft
## # A tibble: 6 x 2
##   names  nLives
##   <chr>   <dbl>
## 1 John      2  
## 2 Ed        1  
## 3 Ned       0  
## 4 Sam       1  
## 5 Benjen    0.5
## 6 Beric     4
write_csv(livesleft, "livesleft.csv")

You decide to write a robust pipeline, so you don’t have to deal with your colleagues’ mess. If there is a problem at any stage, the pipeline should return 1 (the default number of lives). You also want some information on what caused the error. Here’s the procedure:

  1. Read database (may be missing)
  2. Filter by name (name can be missing)
  3. Get number of lives from the lives column (the column can be missing)

If nothing goes wrong, you can just use a pipe operator.

read_csv("livesleft.csv") %>% 
    filter(names=="Ed") %>%
    pull(nLives)
## [1] 1

But let’s see what happens if some of those functions fail!

read_csv("wrongFile.csv") %>% # missing data
    filter(names=="Beric") %>%
    pull(nLives)
## Error: 'wrongFile.csv' does not exist in current working directory ('/builds/Kupac/biofunctor/content/post').
read_csv("livesleft.csv") %>%
    filter(names=="Bran") %>% # missing name
    pull(nLives)
## numeric(0)
read_csv("livesleft.csv") %>%
    filter(names=="Beric") %>% 
    pull(nKilled)             # wrong column name
## Error in eval_tidy(enquo(var), var_env): object 'nKilled' not found

Disaster! No number returned, the game crashes, you have to go back, and fix the database. There must be a way…

Maybe

So you can encounter errors or missing data at any stage of the pipeline. In R, you could deal with these using tryCatch or something similar. But that means you’d have to rewrite each and every function making sure that:

  • The inputs are correct
  • The errors are caught and reported

You can’t escape the second part, but maybe you can avoid checking inputs every time, and produce simpler code.

Actually, these functions return either nothing or something. This can be represented by a maybe value. To implement this, you’ll need a helper function that wraps any value in a maybe container.

just <- function(x) {
  res <- list(
        type = "Just",
        content = x
    )  
  class(res) <- append(class(res), "maybe")
  return(res)
}

In this simple implementation, a Maybe is a list of length 2. The first slot is the string "Just", and the second is the object to be wrapped.

Errors will be represented similarly, by Nothing, accompanied by an error string. It’s also a list of length 2, but the the first slot contains the word "Nothing", and the second the error message.

nothing <- function(errorString) {
  res <- list(
    type = "Nothing",
    content = errorString
  )
  class(res) <- "maybe"
  return(res)
}

For this article, I also create a print method for the maybe class, so it’s not printed as a list.

print.maybe <- function(x, ...) {
  if(x[["type"]] == "Just") {
    cat("Just:\n")
    print(x[["content"]], ...)
  } else {
    cat("Nothing:",
        x[["content"]],
        sep="\n")
  }
}

Here are some examples:

just("a")
## Just:
## [1] "a"
just(matrix(1:16,ncol=4))
## Just:
##      [,1] [,2] [,3] [,4]
## [1,]    1    5    9   13
## [2,]    2    6   10   14
## [3,]    3    7   11   15
## [4,]    4    8   12   16
nothing("This is empty.")
## Nothing:
## This is empty.

Safe functions

With these helpers, you can rewrite the three functions in a safe form. The safe_read_csv will take a string (file name), and return a Maybe tibble.

\[safe\_read\_csv :: character \rightarrow Maybe_{tibble}\]

safe_read_csv <- function(file, ...) {
    if (file.access(file, 4) == -1) {
        return(nothing(paste0("safe_read_csv: Couldn't open '", file, "'.")))
    } else {
        return(just(read_csv(file, ...)))
    }
}
safe_read_csv("livesleft.csv")
## Just:
## # A tibble: 6 x 2
##   names  nLives
##   <chr>   <dbl>
## 1 John      2  
## 2 Ed        1  
## 3 Ned       0  
## 4 Sam       1  
## 5 Benjen    0.5
## 6 Beric     4
safe_read_csv("livesleft_notExist.csv")
## Nothing:
## safe_read_csv: Couldn't open 'livesleft_notExist.csv'.

The safe_filter function takes a tibble and a name, and also returns a Maybe tibble with a single row. \[safe\_filter :: tibble, name \rightarrow Maybe_{tibble[1,]}\]

safe_filter <- function(.data, name) {
  n <- sum(grepl(name, .data$names))
  # If there's only one line with that name,
  # return a maybe tibble
  if(n==1) {
    return(just(filter(.data, names==name)))
  } else {
  # Otherwise return nothing, and explain  
    err <- paste0("safe_filter: The name '", name,
                    "' identifies ", n, " persons.")
    return(nothing(err))
  }
}

The safe_pull function takes a tibble and a column name and returns the Maybe value(s) from the column. If it’s applied after safe_filter, it will be a single value.

\[safe\_filter :: tibble, colname \rightarrow Maybe_{A}\]

safe_pull <- function(.data, varName) {
    varName <- varName[1]
    if(exists(varName, where=.data)) {
        # Return the variable
        return(just(pull(.data, var=varName)))
    } else {
        return(nothing("safe_pull: Requested column is missing from data"))
    }
}

Great! But when you try to combine these functions using the magrittr pipe operator, it fails.

safe_read_csv("livesleft.csv") %>%
    safe_filter("Ed") %>%
    safe_pull("nLives")
## Nothing:
## safe_pull: Requested column is missing from data

Of course; the outputs and inputs don’t match! The outputs are Maybe values, while the functions can’t work on those. We could re-write each function to unwrap Maybe-s, and process the content, or we can create a new pipe operator that does it for them!

Bind

How should this infix operator look like? Well, first let’s look at how the pipe operator (%>%) works. It takes a value of classA and “passes it on” to a function that converts classA to classB. At least that’s what it looks like on the surface.

The pipe operator is actually a higher order function. It takes two arguments: a value on the left hand side (LHS) and a function on the right (RHS). Then, it simply applies the function on the value, and returns the result. It can be written as: \[ \textrm{%>%}::LHS=class_A,~RHS=(class_A \rightarrow class_B) \rightarrow class_B\]

So our new operator should be very similar, except that it should take a Maybe value, unwrap it, and then apply the function. Also, the RHS function should be a safe function to keep the computation in the realm of Maybe-s. This is particularly important, so that we can chain multiple functions together with the new bind operator.

\[ \textrm{%>=%}::LHS=Maybe_A,~RHS=(class_A \rightarrow Maybe_B) \rightarrow Maybe_B\] It should take a Maybe classA (the input), and a safe function that converts a classA to a Maybe classB, and the output should be a Maybe classB. The implementation is quite simple, if we cheat and use the already existing magrittr pipe operator.

`%>=%` <- function(ma, f) {
  if(!is(ma, "maybe")) stop("Provide a maybe value left of '%>=%' !")
  if(ma[[1]]=="Nothing") {
    return(ma) # If Nothing, just pass on the Nothing
  } else {
    # If something, then apply the function on the something (ma[[2]])
    func <- deparse(substitute(f)) # String from function
    cmd <- paste0("ma[[2]] %>% ", func) # Create command string
    res <- eval(parse(text=cmd)) # evaluate command string
    # Check if the function returns a maybe value
    if(is(res, "maybe")) {
      return(res)
    } else {
      stop("RHS function must return a Maybe.")
    }
  }
}
  • If the input is a Nothing value, then it’s simply returned, and the safe function doesn’t run.
  • If the input is a Just, then the safe function is applied on the value in the 2nd slot. Now the pipe works:
safe_read_csv("livesleft.csv") %>=%
    safe_filter("Beric") %>=%
    safe_pull("nLives")
## Just:
## [1] 4

We need one more thing to complete the pipeline: a function to unwrap the maybe-s. Since you always want a value returned in the end, you need a helper function to extract a value from the maybe, and use a default one if it’s a nothing.

\[ fromMaybe:: Maybe_A, class_A\rightarrow class_A\]

fromMaybe <- function(ma, defaultValue) {
    if(ma[[1]]=="Just") return(ma[[2]]) else {
      message("Returning default value, because:\n", ma[[2]])
      return(defaultValue)
    }
}

Now your safe functions can’t fail, and the pipe will always return a result.

safe_read_csv("livesleft.csv",
              col_types=cols(
                names=col_character(),
                nLives=col_double()
              )) %>=%
  safe_filter("Benjen") %>=%
  safe_pull("nLives") %>%   # Regular pipe, fromMaybe
    fromMaybe(defaultValue=1) # expects a maybe value!
## [1] 0.5
safe_read_csv("livesleft.csv",
              col_types=cols(
                names=col_character(),
                nLives=col_double()
              )) %>=%
    safe_filter("Bronn") %>=% # Wrong name
    safe_pull("nLives") %>%   
    fromMaybe(defaultValue=1) 
## Returning default value, because:
## safe_filter: The name 'Bronn' identifies 0 persons.
## [1] 1

Summary

Of course, it’s an overkill to do all this just for three functions, but the concept is very powerful. Not only can you chain together an unlimited number of functions this way, but it can be extended to different kinds of logic. Instead of catching errors, you can pass down a state, write log messages, create collections, etc.

All you need to do is to implement a few basic functions:

  • Wrapper function(s): wrap any basic data type into a monadic value (here: just(), nothing())
  • Functions that return the monadic values (here: “safe functions”)
  • Bind (%>=%): To facilitate the composition of such functions
  • Optionally, a function to unwrap the monadic value

To leave a comment for the author, please follow the link and comment on their blog: R on Biofunctor.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)