**Rstats on goonR blog**, and kindly contributed to R-bloggers)

Recently, I was trying to calculate the percentiles of a set of variables within a data set grouped by another variable. However, I quickly ran into the realization that this is not very straight forward when using `dplyr`

’s `summarize`

. Before I demonstrate, let’s load the libraries that we will need.

```
library(dplyr)
library(purrr)
```

If you don’t believe me when I say that it is not straight forward, go ahead and try to run the following block of code.

```
mtcars %>%
dplyr::group_by(cyl) %>%
dplyr::summarize(quants = quantile(mpg, probs = c(0.2, 0.5, 0.8)))
```

If you ran the code, you will see that it throws the following error:

```
Error in summarise_impl(.data, dots) :
Column `quants` must be length 1 (a summary value), not 3
```

This error is telling us that the result is returning an object of length 3 (our three quantiles) when it is expecting to get only one value. A quick Google search comes up with numerous stack overflow questions and answers about this. Most of these solutions revolve around using the `do`

function to calculate the quantiles on each of the groups. However, according to Hadley, `do`

will eventually be “going away”. While there is no definite time frame on this, I try to use it as little as possible. The new recommended practice is a combination of `tidyr::nest`

, `dplyr::mutate`

and `purrr::map`

for most cases of grouping. I love this approach for most things (and it is even the accepted for one of the SO questions mentioned above) but I worked up a new solution that I think is useful for calculating percentiles on multiple groups for any desired number of percentiles.

This method uses `purrr::map`

and a Function Operator, `purrr::partial`

, to create a list of functions that can than be applied to a data set using `dplyr::summarize_at`

and a little magic from `rlang`

.

Let’s start by creating a vector of the desired percentiles to calculate. In this example, we will calculate the 20^{th}, 50^{th}, and 80^{th} percentiles.

`p <- c(0.2, 0.5, 0.8)`

Now we can create a list of functions, with one for each quantile, using `purrr::map`

and `purrr::partial`

. We can also assign names to each function (useful for the output of `summarize`

) using `purrr::set_names`

```
p_names <- map_chr(p, ~paste0(.x*100, "%"))
p_funs <- map(p, ~partial(quantile, probs = .x, na.rm = TRUE)) %>%
set_names(nm = p_names)
p_funs
```

```
## $`20%`
## function (...)
## quantile(probs = .x, na.rm = TRUE, ...)
##
```
##
## $`50%`
## function (...)
## quantile(probs = .x, na.rm = TRUE, ...)
##
##
## $`80%`
## function (...)
## quantile(probs = .x, na.rm = TRUE, ...)
##

Looking at `p_funs`

we can see that we have a named list with each element containing a function comprised of the `quantile`

function. The beauty of this is that you can use this list in the same way you would define multiple functions in any other `summarize_at`

or `summarize_all`

functions (i.e. `funs(mean, sd)`

). The only difference is that we will now have to use the “bang-bang-bang” operator (`!!!`

) from `rlang`

(it is also exported from `dplyr`

). The final product looks like this.

```
mtcars %>%
group_by(cyl) %>%
summarize_at(vars(mpg), funs(!!!p_funs))
```

```
## # A tibble: 3 x 4
## cyl `20%` `50%` `80%`
##
```
## 1 4 22.8 26 30.4
## 2 6 18.3 19.7 21
## 3 8 13.9 15.2 16.8

I think that this provides a pretty neat way to get the desired output in a format that does not require a large amount of post calculation manipulation. In addition, it is, in my opinion, more straightforward than a lot of the `do`

methods. This method also allows for quantiles to be calculated for more than one variable, although post-processing would be necessary in that case. Here is an example.

```
mtcars %>%
group_by(cyl) %>%
summarize_at(vars(mpg, hp), funs(!!!p_funs)) %>%
select(cyl, contains("mpg"), contains("hp"))
```

```
## # A tibble: 3 x 7
## cyl `mpg_20%` `mpg_50%` `mpg_80%` `hp_20%` `hp_50%` `hp_80%`
##
```
## 1 4 22.8 26 30.4 65 91 97
## 2 6 18.3 19.7 21 110 110 123
## 3 8 13.9 15.2 16.8 175 192. 245

`partial`

is *yet another* tool from the `purrr`

package that can greatly enhance your R coding abilities. While this is surely a basic application of its functionality, one can easily see how powerful this function can be.

**leave a comment**for the author, please follow the link and comment on their blog:

**Rstats on goonR blog**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...