**RStudio Blog**, and kindly contributed to R-bloggers)

Purrr is a new package that fills in the missing pieces in R’s functional programming tools: it’s designed to make your pure functions purrr. Like many of my recent packages, it works with magrittr to allow you to express complex operations bying combining simple pieces in a standard way.

Install it with:

`install.packages("purrr")`

Purrr wouldn’t be possible without Lionel Henry. He wrote a lot of the package and his insightful comments helped me rapidly iterate towards a stable, useful, and understandable package.

## Map functions

The core of purrr is a set of functions for manipulating vectors (atomic vectors, lists, and data frames). The goal is similar to dplyr: help you tackle the most common 90% of data manipulation challenges. But where dplyr focusses on data frames, purrr focusses on vectors. For example, the following code splits the built-in mtcars dataset up by number of cylinders (using the base `split()`

function), fits a linear model to each piece, summarises each model, then extracts the the (R^2):

```
mtcars %>%
split(.$cyl) %>%
map(~lm(mpg ~ wt, data = .)) %>%
map(summary) %>%
map_dbl("r.squared")
#> 4 6 8
#> 0.509 0.465 0.423
```

The first argument to all map functions is the vector to operate on. The second argument, `.f`

specifies what to do with each piece. It can be:

- A function, like
`summary()`

. - A formula, which is converted to an anonymous function, so that
`~ lm(mpg ~ wt, data = .)`

is shorthand for`function(x) lm(mpg ~ wt, data = x)`

. - A string or number, which is used to extract components, i.e.
`"r.squared"`

is shorthand for`function(x) x[[r.squared]]`

and`1`

is shorthand for`function(x) x[[1]]`

.

Map functions come in a few different variations based on their inputs and output:

`map()`

takes a vector (list or atomic vector) and returns a list.`map_lgl()`

,`map_int()`

,`map_dbl()`

, and`map_chr()`

take a vector and return an atomic vector.`flatmap()`

works similarly, but allows the function to return arbitrary length vectors.`map_if()`

only applies`.f`

to those elements of the list where`.p`

is true. For example, the following snippet converts factors into characters:`iris %>% map_if(is.factor, as.character) %>% str() #> 'data.frame': 150 obs. of 5 variables: #> $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... #> $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ... #> $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ... #> $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ... #> $ Species : chr "setosa" "setosa" "setosa" "setosa" ...`

`map_at()`

works similarly but instead of working with a logical vector or predicate function, it works with a integer vector of element positions.`map2()`

takes a pair of lists and iterates through them in parallel:`map2(1:3, 2:4, c) #> [[1]] #> [1] 1 2 #> #> [[2]] #> [1] 2 3 #> #> [[3]] #> [1] 3 4 map2(1:3, 2:4, ~ .x * (.y - 1)) #> [[1]] #> [1] 1 #> #> [[2]] #> [1] 4 #> #> [[3]] #> [1] 9`

`map3()`

does the same thing for three lists, and`map_n()`

does it in general.`invoke()`

,`invoke_lgl()`

,`invoke_int()`

,`invoke_dbl()`

, and`invoke_chr()`

take a list of functions, and call each one with the supplied arguments:`list(m1 = mean, m2 = median) %>% invoke_dbl(rcauchy(100)) #> m1 m2 #> 9.765 0.117`

`walk()`

takes a vector, calls a function on piece, and returns its original input. It’s useful for functions called for their side-effects; it returns the input so you can use it in a pipe.

### Purrr and dplyr

I’m becoming increasingly enamoured with the list-columns in data frames. The following example combines purrr and dplyr to generate 100 random test-training splits in order to compute an unbiased estimate of prediction quality. These tools are still experimental (and currently need quite a bit of extra scaffolding), but I think the basic approach is really appealing.

```
library(dplyr)
random_group <- function(n, probs) {
probs <- probs / sum(probs)
g <- findInterval(seq(0, 1, length = n), c(0, cumsum(probs)),
rightmost.closed = TRUE)
names(probs)[sample(g)]
}
partition <- function(df, n, probs) {
n %>%
replicate(split(df, random_group(nrow(df), probs)), FALSE) %>%
zip_n() %>%
as_data_frame()
}
msd <- function(x, y) sqrt(mean((x - y) ^ 2))
# Genearte 100 random test-training splits,
cv <- mtcars %>%
partition(100, c(training = 0.8, test = 0.2)) %>%
mutate(
# Fit the model
model = map(training, ~ lm(mpg ~ wt, data = .)),
# Make predictions on test data
pred = map2(model, test, predict),
# Calculate mean squared difference
diff = map2(pred, test %>% map("mpg"), msd) %>% flatten()
)
cv
#> Source: local data frame [100 x 5]
#>
#> test training model pred diff
#> (list) (list) (list) (list) (dbl)
#> 1
``` 3.70
#> 2 2.03
#> 3 2.29
#> 4 4.88
#> 5 3.20
#> 6 4.68
#> 7 3.39
#> 8 3.82
#> 9 2.56
#> 10 3.40
#> .. ... ... ... ... ...
mean(cv$diff)
#> [1] 3.22

## Other functions

There are too many other pieces of purrr to describe in detail here. A few of the most useful functions are noted below:

`zip_n()`

allows you to turn a list of lists “inside-out”:`x <- list(list(a = 1, b = 2), list(a = 2, b = 1)) x %>% str() #> List of 2 #> $ :List of 2 #> ..$ a: num 1 #> ..$ b: num 2 #> $ :List of 2 #> ..$ a: num 2 #> ..$ b: num 1 x %>% zip_n() %>% str() #> List of 2 #> $ a:List of 2 #> ..$ : num 1 #> ..$ : num 2 #> $ b:List of 2 #> ..$ : num 2 #> ..$ : num 1 x %>% zip_n(.simplify = TRUE) %>% str() #> List of 2 #> $ a: num [1:2] 1 2 #> $ b: num [1:2] 2 1`

`keep()`

and`discard()`

allow you to filter a vector based on a predicate function.`compact()`

is a helpful wrapper that throws away empty elements of a list.`1:10 %>% keep(~. %% 2 == 0) #> [1] 2 4 6 8 10 1:10 %>% discard(~. %% 2 == 0) #> [1] 1 3 5 7 9 list(list(x = TRUE, y = 10), list(x = FALSE, y = 20)) %>% keep("x") %>% str() #> List of 1 #> $ :List of 2 #> ..$ x: logi TRUE #> ..$ y: num 10 list(NULL, 1:3, NULL, 7) %>% compact() %>% str() #> List of 2 #> $ : int [1:3] 1 2 3 #> $ : num 7`

`lift()`

(and friends) allow you to convert a function that takes multiple arguments into a function that takes a list. It helps you compose functions by lifting their domain from a kind of input to another kind. The domain can be changed to and from a list (l), a vector (v) and dots (d).`cross2()`

,`cross3()`

and`cross_n()`

allow you to create the Cartesian product of the inputs (with optional filtering).- A number of functions let you manipulate functions:
`negate()`

,`compose()`

,`partial()`

. - A complete set of predicate functions provides predictable versions of the
`is.*`

functions:`is_logical()`

,`is_list()`

,`is_bare_double()`

,`is_scalar_character()`

, etc. - Other equivalents functions wrap existing base R functions into to the consistent design of purrr:
`replicate()`

->`rerun()`

,`Reduce()`

->`reduce()`

,`Find()`

->`detect()`

,`Position()`

->`detect_index()`

.

## Design philosophy

The goal of purrr is not try and turn R into Haskell in R: it does not implement currying, or destructuring binds, or pattern matching. The goal is to give you similar expressiveness to a classical FP language, while allowing you to write code that looks and feels like R.

- Anonymous functions are verbose in R, so we provide two convenient shorthands. For predicate functions,
`~ .x + 1`

is equivalent to`function(.x) .x + 1`

. For chains of transformations functions,`. %>% f() %>% g()`

is equivalent to`function(.) . %>% f() %>% g()`

. - R is weakly typed, so we can implement general
`zip_n()`

, rather than having to specialise on the number of arguments. That said, we still provide`map2()`

and`map3()`

since it’s useful to clearly separate which arguments are vectorised over. Functions are designed to be output type-stable (respecting Postel’s law) so you can rely on the output being as you expect. - R has named arguments, so instead of providing different functions for minor variations (e.g.
`detect()`

and`detectLast()`

) we use a named arguments. - Instead of currying, we use
`...`

to pass in extra arguments. Arguments of purrr functions always start with`.`

to avoid matching to the arguments of`.f`

passed in via`...`

. - Instead of point free style, use the pipe,
`%>%`

, to write code that can be read from left to right.

**leave a comment**for the author, please follow the link and comment on their blog:

**RStudio Blog**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...