What’s R vector, Victor?

[This article was first published on R on Biofunctor, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Cook with flatmap

In this week’s episode of the “Hidden Monads in R” series, I’ll explore the vector aspect of R data structures, and see how the flatmap operation can be quite useful.

Flatmap? Aren’t all maps flat?

The Nobel Prize organisation provides an API with information about the prizes and laureates. We can retrieve a JSON file, which is what I did. I read the file and examine one of the entries below.

# Source: http://api.nobelprize.org/v1/prize.json
prizes <- jsonlite::fromJSON("./prize.json", simplifyDataFrame = FALSE)[["prizes"]]

str(prizes[[11]])

## List of 3
##  $ year     : chr "2023"
##  $ category : chr "physics"
##  $ laureates:List of 3
##   ..$ :List of 5
##   .. ..$ id        : chr "1026"
##   .. ..$ firstname : chr "Pierre"
##   .. ..$ surname   : chr "Agostini"
##   .. ..$ motivation: chr "\"for experimental methods that generate attosecond pulses of light for the study of electron dynamics in matter\""
##   .. ..$ share     : chr "3"
##   ..$ :List of 5
##   .. ..$ id        : chr "1027"
##   .. ..$ firstname : chr "Ferenc"
##   .. ..$ surname   : chr "Krausz"
##   .. ..$ motivation: chr "\"for experimental methods that generate attosecond pulses of light for the study of electron dynamics in matter\""
##   .. ..$ share     : chr "3"
##   ..$ :List of 5
##   .. ..$ id        : chr "1028"
##   .. ..$ firstname : chr "Anne"
##   .. ..$ surname   : chr "L’Huillier"
##   .. ..$ motivation: chr "\"for experimental methods that generate attosecond pulses of light for the study of electron dynamics in matter\""
##   .. ..$ share     : chr "3"

Let’s say I want a character vector containing the full names of Nobel laureates in medicine since 2020. First, I can concoct a function that gets such a vector from a single entry (I know, this one is physics).

who_got_it <- function(prize) {
    laureates <- vapply(
        X = prize[["laureates"]],
        FUN = \(l) c(l[["surname"]] %||% "", l[["firstname"]] %||% ""),
        FUN.VALUE = c("Doe", "John")
    )
    trimws(paste(laureates[2,], laureates[1,]))
}

who_got_it(prizes[[11]])

## [1] "Pierre Agostini" "Ferenc Krausz"   "Anne L’Huillier"

To achieve my goal, I just have to filter the list accordingly, and lapply the function on the matching entries.

(medicine_since_2020 <- Filter(
    f = \(p) p[["category"]] == "medicine" & as.numeric(p[["year"]]) >= 2020,
    x = prizes
    ) |>
    lapply(who_got_it)
)

## [[1]]
## [1] "Victor Ambros" "Gary Ruvkun"  
## 
## [[2]]
## [1] "Katalin Karikó" "Drew Weissman" 
## 
## [[3]]
## [1] "Svante Pääbo"
## 
## [[4]]
## [1] "David Julius"      "Ardem Patapoutian"
## 
## [[5]]
## [1] "Harvey Alter"     "Michael Houghton" "Charles Rice"

Neat! But I want them in a single vector. so I need an unlist step at the end.

unlist(medicine_since_2020)

##  [1] "Victor Ambros"     "Gary Ruvkun"       "Katalin Karikó"   
##  [4] "Drew Weissman"     "Svante Pääbo"      "David Julius"     
##  [7] "Ardem Patapoutian" "Harvey Alter"      "Michael Houghton" 
## [10] "Charles Rice"

Yes, it’s that simple. This is a flatmap process for vectors, and it’s a composition of a map and a flatten step (lapply and unlist in this case). It almost looks silly to write a flatmap function, after all it’s not that difficult to lapply and unlist sequentially. But it’s used often, so it saves time and reduces mistakes. In this case – to be correct – I should have used unlist(recursive = FALSE), otherwise it flattens nested lists, and that would be wrong.

Laboratory experiments are often performed in 96-well plastic plates, with 8 rows (labeled A-H) and 12 columns (labeled 1-12). Each microwell is a separate micro-experiment (labeled A1-H12). Let’s generate well labels for such a dataset!

rows <- LETTERS[1:8]
columns <- 1:12 |> sprintf(fmt = "%02i")

So all we have to do is combine one vector of values with another, using the handy paste0() function, right? Wrong.

paste0(rows, columns) |> noquote()

##  [1] A01 B02 C03 D04 E05 F06 G07 H08 A09 B10 C11 D12

We’ve only got 12 values instead of 96, and the shorter vector (letters) is recycled as needed. It’s often what you want, so it’s done this way for a good reason. But in this case, we’d prefer to have an each-with-each combination.

Some readers may already have started daydreaming of nested for loops (please don’t). More experienced R programmers would probably go for expand.grid() or rep(rows, each = length(columns) to match up the vectors, and then paste() them together. But R is a versatile language, and there are many paths to the same destination. A functional R programmer could just take flatmap off the shelf, and here is how.

For purely didactic reasons, let’s define a non-vectorized paste function, called paste01 1. It takes a single value and a character vector, and returns a character vector – the combination of the value with each member of the vector.

$$paste01 :: Str \rightarrow [Str] \rightarrow [Str]$$

paste01 <- \(x, y) { stopifnot(length(x) == 1L); paste0(x, y)}
paste01(rows[1], columns)

##  [1] "A01" "A02" "A03" "A04" "A05" "A06" "A07" "A08" "A09" "A10" "A11" "A12"

When we map this function on our rows vector, we almost get what we need.

$$lapply(paste01) :: [Str] \rightarrow [Str] \rightarrow [[Str]]$$

lapply(rows, paste01, columns) |> head(3L)

## [[1]]
##  [1] "A01" "A02" "A03" "A04" "A05" "A06" "A07" "A08" "A09" "A10" "A11" "A12"
## 
## [[2]]
##  [1] "B01" "B02" "B03" "B04" "B05" "B06" "B07" "B08" "B09" "B10" "B11" "B12"
## 
## [[3]]
##  [1] "C01" "C02" "C03" "C04" "C05" "C06" "C07" "C08" "C09" "C10" "C11" "C12"

It’s a list of vectors, so we have to flatten it. Yupp, it’s a flatmap.

$$unlist(lapply(paste01)) :: [Str] \rightarrow [Str] \rightarrow [Str]$$

unlist(lapply(rows, paste01, columns)) |>
    noquote()

##  [1] A01 A02 A03 A04 A05 A06 A07 A08 A09 A10 A11 A12 B01 B02 B03 B04 B05 B06 B07
## [20] B08 B09 B10 B11 B12 C01 C02 C03 C04 C05 C06 C07 C08 C09 C10 C11 C12 D01 D02
## [39] D03 D04 D05 D06 D07 D08 D09 D10 D11 D12 E01 E02 E03 E04 E05 E06 E07 E08 E09
## [58] E10 E11 E12 F01 F02 F03 F04 F05 F06 F07 F08 F09 F10 F11 F12 G01 G02 G03 G04
## [77] G05 G06 G07 G08 G09 G10 G11 G12 H01 H02 H03 H04 H05 H06 H07 H08 H09 H10 H11
## [96] H12

Tadaa!

So, a flatmap function for vectors can be defined. It takes:

  1. a vector of values
  2. a function that turns one of those into a (potentially different kind of) vector

The output type matches the 2nd kind of vector.

$$ flatmap :: [a] \rightarrow (a \rightarrow [b]) \rightarrow [b] $$

flatmap <- function(X, FUN, ..., USE.NAMES = TRUE) {
    unlist(lapply(X, FUN, ...), recursive = FALSE, USE.NAMES = USE.NAMES)
}

Debrief

Such a function could also be defined as an infix operator, and could take the form of %>>=%, for example. If that looks familiar, it’s not a coincidence: flatmap is the bind operation for the vector monad. Previously, I assumed that yet another infix operator is not what R needs the most, and I created a function wrapper instead.

The same could be done here! R already has a very similar wrapper, base::Vectorize(), which only needs a tiny tweak, unlist()-ing the results. It’s so trivial that I won’t even write it out here.

What excites me much more is the possibility of combining the two ideas: handling NA-s and flatmapping in a single bind wrapper function, which would truly allow focusing on the logic, and let the “expert” wrapper deal with the rest. As customary, some more exploration is needed.


  1. Actually, this works equally well with the original paste0, because lapply will map on the first argument anyway, which guarantees that we’ll deal with a single value in each iteration. ↩︎

To leave a comment for the author, please follow the link and comment on their blog: R on Biofunctor.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)