**R Blog**, and kindly contributed to R-bloggers)

Lately I have been rather productive in my programming and frustrated at the same time. Trying to solve the problems of creating a demographics summary table proved to be a lesson in frustration with R. Since I love R, this was disheartening. I did eventually find the `reporttools`

package which does make a great latex table, but onlyin latex. Also the `tables`

package looks great, but also not entirely what I was looking for, so I do the first logical thing for an R User when faced with this sort of thing. I created a package to fill in the missing functionality.

## The `dostats`

package/function

The new package is `dostats`

. There are two functions of the package.

- Create summaries of vectors through the
`dostats`

function. - Manipulate functions.

The package started out with the `dostats`

function for creating more informative summary tables. It works very similar with `tabular`

from `tables`

package, but it is designed to work with `plyr`

functions. The idea is to pass in a vector as the first argument and then the remaining arguments are functions that compute statistics on the vector. For example:

```
library(dostats)
set.seed(20120220)
dostats(rnorm(100), mean, sd, N = length)
```

```
## mean sd N
## 1 0.0775 0.8975 100
```

There is also the renaming construct built in to create the desired variables. This construct is nice because it facilitates easily passing as an argument into `ldply`

such as

```
library(plyr)
ldply(mtcars, dostats, mean, sd, IQR)
```

```
## .id mean sd IQR
## 1 mpg 20.0906 6.0269 7.375
## 2 cyl 6.1875 1.7859 4.000
## 3 disp 230.7219 123.9387 205.175
## 4 hp 146.6875 68.5629 83.500
## 5 drat 3.5966 0.5347 0.840
## 6 wt 3.2172 0.9785 1.029
## 7 qsec 17.8487 1.7869 2.008
## 8 vs 0.4375 0.5040 1.000
## 9 am 0.4062 0.4990 1.000
## 10 gear 3.6875 0.7378 1.000
## 11 carb 2.8125 1.6152 2.000
```

This makes for a more logical summary `data.frame`

object that has usable columns, each with the same data type. Unfortunatly this does not always work for all data set. The above example only has numerical data. Any data frame with categorigal data would have that data treated as categorical. Another limitation is that the results of each function must be the same dimention for each variable. For this reason I introduced functions that filter by the variable class.

`class.stats`

creates a dostats function for a given class, tested by`inherits`

.`integer.stats`

predefined class stats for integer variables. This defined as`class.stats('integer')`

`numeric.stats`

for numeric variables, which would also include integer variables.`factor.stats`

for factors.

When a `class.stats`

function is passed to ldply, variable not matching that class are silently removed.

`ldply(iris, numeric.stats, mean, sd)`

```
## .id mean sd
## 1 Sepal.Length 5.843 0.8281
## 2 Sepal.Width 3.057 0.4359
## 3 Petal.Length 3.758 1.7653
## 4 Petal.Width 1.199 0.7622
```

`ldply(iris, factor.stats, N = length)`

```
## .id N
## 1 Species 150
```

You can also chain together arguments to compute on subsets using `ddply`

and `ldply`

.

```
ddply(iris, .(Species), ldply, numeric.stats,
mean, median, sd)
```

```
## Species .id mean median sd
## 1 setosa Sepal.Length 5.006 5.00 0.3525
## 2 setosa Sepal.Width 3.428 3.40 0.3791
## 3 setosa Petal.Length 1.462 1.50 0.1737
## 4 setosa Petal.Width 0.246 0.20 0.1054
## 5 versicolor Sepal.Length 5.936 5.90 0.5162
## 6 versicolor Sepal.Width 2.770 2.80 0.3138
## 7 versicolor Petal.Length 4.260 4.35 0.4699
## 8 versicolor Petal.Width 1.326 1.30 0.1978
## 9 virginica Sepal.Length 6.588 6.50 0.6359
## 10 virginica Sepal.Width 2.974 3.00 0.3225
## 11 virginica Petal.Length 5.552 5.55 0.5519
## 12 virginica Petal.Width 2.026 2.00 0.2747
```

## Function manipulations

Passing all these functions around also requires some extra function manipulation functions. Now that is a mouthful, but something we do with R.

### Composition

R lacks a function composition function. So I created one. `function(x)any(is.na(x))`

is just to long to type, and I find myself doing things like this far too often. The word “function” is just too long to type and takes up lots of space. It is much easier to do `any%.%is.na`

or `compose(any, is.na)`

either of which results in a function that creates a new function testing if there are any missing values. The two forms are

`compose(...)`

`fun1%.%fun2`

`compose`

takes any number of arguments and nests them with the right most being the inner most and the left being the outermost. The easy to remember is that they read the same as when they were input.

### Argument Manipulations

Composition and dostats, only operate on the first argument which necessitates functions for manipulating arguments.

`wargs`

: creates a new function with changed defaults. An example would be`wargs(mean, rm.na=T)`

creates a new function that automatically removes missing values.`onarg`

: Specifies the first argument for the function. Such as`onarg(rep,'times')`

makes the number of times to repeate the first argument.

One example of this that is included in `dostats`

is the `contains`

and `%contains%`

which is the reverse order of `%in%`

.

## Conclussion

There will likely be more functions as I come across the necessity. If you have an idea that should be included submit to the issues tracker.

**leave a comment**for the author, please follow the link and comment on their blog:

**R Blog**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...