**Pareto's Playground**, and kindly contributed to R-bloggers)

# Introduction of Parameterized dplyr expression

The usefullness of any small function you write will eventually be judged upon its ability to be generically applied across any arbitrary data. As I explored a blog post from Dec 2016, I became a lot more interested in writing dynamic code with dplyr functions that form part of the data wrangling silo in my analytical flow. This ability came with the new `replyr`

package – No longer will I have the need to break up my data processing when columns have to be changed as my code depends on certain column names in my dataset that is currently in use.

‘replyr allows you to encapsulate complex dplyr expressions without the use of the lazyeval package, which is the currently recommended way to manage dplyr‘s use of non-standard evaluation’

The example `replyr`

provides works out summary statistics of an arbitrary column, with the ability to group by another column. Imagine you had a quick and easy function which could pump out summary statistics without too much fuss. Lets take a look at the `replyr`

construct of such a function provided by the package maintainers Win-Vector:

For my analysis I will be using the `snail`

dataset.

This dataset contains data on the probability of a snail surviving given certain stimuli such as:

- exposure in weeks
- relative humidity (4 levels)
- temperature, in degrees Celsius (3 levels)
- deaths

Lets now see how the function outputs with a simple example using the `snail`

dataset.

As you can see, the summary statistics for the death column was worked out. For those who noticed the negtive `sdlower`

– no, snails did not wake up, its purely for example purposes.

Expanding this idea of by includeing a grouping variable, we can get the summary statistics per specie:

So, we can see the function works well in the sense that you can now write dynamic loops which could apply `dplyr`

functions to an arbitrary column and grouping variable of your choice based on string inputs…

## But that was not the end

In one part of the Win-Vector blog they have the following challenge:

To write such a function in dplyr can get quite hairy, quite quickly. Try it yourself, and see

So, with this in mind I have, with some fighting I might add, developed a similar function that has some added benefits beyond that of the `replyr`

function using the `dplyr`

package.

By combining the `one_of`

and the standard standard evaluation of the `dplyr`

functions we have recreated the results of the `replyr`

function:

The added benefit of using the dplyr notation is the ability to include *mutliple* columns in the function that you want to summarise:

The `replyr`

approach will not be able to handle this and will you give an error such as:

And with a little bit of tidying of the data, you can have a much richer dataset

I do find that applying parameterized dplyr functions an amazing advantage when working with more advanced ETL workflows. This is by no means to say that the `replyr`

package is redundant, but more a declaration to say that `dplyr`

has the ability to dynamically conduct analysis – albeit some fighting to get it to work.

**leave a comment**for the author, please follow the link and comment on their blog:

**Pareto's Playground**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...