**A Statistics Blog - R**, and kindly contributed to R-bloggers)

## Introduction

This is is first in a projected five-part series of posts aimed at colleagues who teach elementary statistics.

I can’t remember exactly how I first made acquaintance with R, but it’s been an important part of my teaching/consulting life since at least 2007, when I used it for the first time in an upper-level undergraduate statistics class. As of last Fall all of our statistics classes—even the elementary ones—are based on R. We may never return to a ground-up GUI platform. If you teach with R but hesitate to spring such a powerful and complex tool on unsuspecting introductory students—many of whom whom will have had no prior experience with the command line, much less with coding—then I hope these posts will give you some encouragement.

## Reason #1: Package `mosaic`

This package is a product of the NSF-funded Project Mosaic, led by Nick Horton, Daniel Kaplan and Randall Pruim. It’s on CRAN, but you might want to keep up with the very latest releases on Github:

```
require(devtools)
install_github(repo="pruim/mosaic")
```

`mosaic`

aims to flatten the learning curve for elementary students by gentling down the coding aspects of the R experience. With `mosaic`

:

- students need to know relatively few R-functions in order to thrive in the introductory course;
- many of the these functions are “wrappers” for standard R-functions, and provide a fairly uniform interface for user input;
- the package provides tools that mostly obviate the need for students to deal with R as a programming language:
- the students don’t have to write their own functions;
- they don’t need to think much about the different types of R objects;
- they don’t even need to learn about flow-control structures.

### Keeping Simple Things Simple

R can make the easy stuff surprisingly tricky for beginners: suppose for example, that you want numerical summaries of a particular numerical variable, broken down by the values of some factor variable. Standard procedure in R would be to write your own anonymous function as an argument for `aggregate()`

, thus:

```
aggregate(mpg~cyl,data=mtcars,
FUN=function(x) c(mean=mean(x),sd=sd(x)))
## cyl mpg.mean mpg.sd
## 1 4 26.664 4.510
## 2 6 19.743 1.454
## 3 8 15.100 2.560
```

`mosaic`

skirts the problem by providing wrapper functions for aggregation:

```
require(mosaic)
mean(mpg~cyl,data=mtcars)
## 4 6 8
## 26.66 19.74 15.10
sd(mpg~cyl,data=mtcars)
## 4 6 8
## 4.510 1.454 2.560
```

One can attain even more simplicity, at only a small cost in flexibility, by sticking to `mosaic`

’s `favstats()`

as a one-stop shop:

```
favstats(mpg~cyl,data= mtcars)
## .group min Q1 median Q3 max mean sd n missing
## 1 4 21.4 22.80 26.0 30.40 33.9 26.66 4.510 11 0
## 2 6 17.8 18.65 19.7 21.00 21.4 19.74 1.454 7 0
## 3 8 10.4 14.40 15.2 16.25 19.2 15.10 2.560 14 0
```

### Flow-Control for the Masses

`mosaic`

includes powerful wrapper functions that permit extensive work with re-sampling and simulation, without the need to learn flow-control. Here follows a `mosaic`

-style implementation of a permutation test.

Consider the data frame `Pseudoscorpions`

from the `abd`

package:

```
require(abd)
data(Pseudoscorpions)
```

`Pseudoscorpions`

contains the results of an experiment on 36 female Pseudoscorpions: each one was either mated twice with a single male (`SM`

) or was mated with two males, one time each (`DM`

), receiving about the same total amount of sperm under either treatment. The idea was to see whether an increase in genetic diversity of sperm sources increases the number of successful broods a female produces during her lifetime.

Here are some descriptive results:

```
favstats(successful.broods~treatment,
data=Pseudoscorpions)[c language="(".group","mean","sd")"][/c]
## .group mean sd
## 1 DM 3.625 1.962
## 2 SM 2.200 1.609
```

For the permutation test, we first compute and store the observed difference between the sample means:

```
obsDiff <- compareMean(successful.broods~treatment,
data=Pseudoscorpions)
obsDiff
## [1] -1.425
```

Next, we create an empirical Null distribution with `shuffle()`

(the random permutation function) and `do()`

(a for-loop wrapper):

```
set.seed(12345)
nullDist <- do(2500)*(
compareMean(successful.broods~shuffle(treatment),
data=Pseudoscorpions))
```

Finally, we call `statTally()`

for numerical and graphical analysis of the results:

```
statTally(obsDiff,nullDist)
## Null distribution appears to be asymmetric. (p = 1.07e-05)
##
## Test statistic applied to sample data = -1.425
##
## Quantiles of test statistic applied to random data:
## 50% 90% 95% 99%
## 0.0375 0.8250 1.0500 1.3875
##
## Of the random samples
##
## 13 ( 0.52 % ) had test stats = -1.425
##
## 23 ( 0.92 % ) had test stats < -1.425
```

We seem to have fairly strong evidence ($P \approx 1.7\%$) that mating with more males increases the number of successful broods.

### There is Much More

I have only scratched the surface of the `mosaic`

package, which is rich enough to support statistics instruction in both elementary and advanced courses. The `mosaic`

authors provide extensive instructor resources in the package vignettes, and frequently offer workshops and short-courses, especially through events sponsored by the Consortium for the Advancement of Undergraduate Statistics Education.

Next week I’ll introduce a supplementary package that is intended for students who might require even more simplicity, and that aligns their R experience with a particular set of teaching objectives.

## References

The Pseudoscorpion experiment is discussed in Whitlock and Schluter’s *The Analysis of Biological Data* (Roberts and Company Publishers; First Edition, 1st Edition July 2008).

**leave a comment**for the author, please follow the link and comment on his blog:

**A Statistics Blog - R**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...