# Introducing the CGPfunctions package – March 22, 2018

**Chuck Powell**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

## Overview

This package includes functions that I find useful for teaching

statistics as well as actually practicing the art. They typically are

not “new” methods but rather wrappers around either base R or other

packages and concepts I’m trying to master. Currently contains:

`Plot2WayANOVA`

which as the name implies conducts a 2 way ANOVA and

plots the results using`ggplot2`

`neweta`

which is a helper function that appends the results of a

Type II eta squared calculation onto a classic ANOVA table`Mode`

which finds the modal value in a vector of data`SeeDist`

which wraps around ggplot2 to provide visualizations of

univariate data.`OurConf`

is a simulation function that helps you learn about

confidence intervals

## Installation

```
<span class="c1"># Install from CRAN</span><span class="w">
</span><span class="n">install.packages</span><span class="p">(</span><span class="s2">"CGPfunctions"</span><span class="p">)</span><span class="w">
</span><span class="c1"># Highly recommended since it is under rapid development right now</span><span class="w">
</span><span class="c1"># Or the development version from GitHub</span><span class="w">
</span><span class="c1"># install.packages("devtools")</span><span class="w">
</span><span class="n">devtools</span><span class="o">::</span><span class="n">install_github</span><span class="p">(</span><span class="s2">"ibecav/CGPfunctions"</span><span class="p">)</span><span class="w">
</span>
```

## Usage

`library(CGPfunctions)`

will load the package which contains 5

functions:

`SeeDist`

will give you some plots of the distribution of a variable

using `ggplot2`

```
<span class="n">library</span><span class="p">(</span><span class="n">CGPfunctions</span><span class="p">)</span><span class="w">
</span><span class="n">SeeDist</span><span class="p">(</span><span class="n">mtcars</span><span class="o">$</span><span class="n">hp</span><span class="p">,</span><span class="n">whatvar</span><span class="o">=</span><span class="s2">"Horsepower"</span><span class="p">,</span><span class="n">whatplots</span><span class="o">=</span><span class="s2">"d"</span><span class="p">)</span><span class="w">
</span>
```

```
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 52.0 96.5 123.0 146.7 180.0 335.0
```

`Mode`

is a helper function that simply returns one or more modal values

```
<span class="n">Mode</span><span class="p">(</span><span class="n">mtcars</span><span class="o">$</span><span class="n">hp</span><span class="p">)</span><span class="w">
</span><span class="c1">#> [1] 110 175 180</span><span class="w">
</span>
```

`neweta`

is a helper function which returns a tibble containing AOV

output similar to summary(aov(MyAOV)) but with eta squared computed and

appended as an additional column

```
<span class="n">MyAOV</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">aov</span><span class="p">(</span><span class="n">mpg</span><span class="o">~</span><span class="n">am</span><span class="o">*</span><span class="n">cyl</span><span class="p">,</span><span class="w"> </span><span class="n">mtcars</span><span class="p">)</span><span class="w">
</span><span class="n">neweta</span><span class="p">(</span><span class="n">MyAOV</span><span class="p">)</span><span class="w">
</span><span class="c1">#> # A tibble: 4 x 8</span><span class="w">
</span><span class="c1">#> Source Df `Sum Sq` `Mean Sq` `F value` p sigstars `eta sq`</span><span class="w">
</span><span class="c1">#> <fct> <int> <dbl> <dbl> <dbl> <dbl> <chr> <dbl></span><span class="w">
</span><span class="c1">#> 1 am 1 37.0 37.0 4.30 0.0480 * 0.0330</span><span class="w">
</span><span class="c1">#> 2 cyl 1 450. 450. 52.0 0. *** 0.399 </span><span class="w">
</span><span class="c1">#> 3 am:cyl 1 29.4 29.4 3.40 0.0760 . 0.0260</span><span class="w">
</span><span class="c1">#> 4 Residuals 28 242. 8.64 NA NA <NA> 0.215</span><span class="w">
</span>
```

The `Plot2WayANOVA`

function conducts a classic analysis using existing

R functions and packages in a sane and defensible manner not necessarily

in the one and only manner.

```
<span class="n">Plot2WayANOVA</span><span class="p">(</span><span class="n">mpg</span><span class="o">~</span><span class="n">am</span><span class="o">*</span><span class="n">cyl</span><span class="p">,</span><span class="w"> </span><span class="n">mtcars</span><span class="p">)</span><span class="w">
</span><span class="c1">#> </span><span class="w">
</span><span class="c1">#> Converting am to a factor --- check your results</span><span class="w">
</span><span class="c1">#> </span><span class="w">
</span><span class="c1">#> Converting cyl to a factor --- check your results</span><span class="w">
</span><span class="c1">#> </span><span class="w">
</span><span class="c1">#> You have an unbalanced design. Using Type II sum of squares, eta squared may not sum to 1.0</span><span class="w">
</span><span class="c1">#> # A tibble: 4 x 8</span><span class="w">
</span><span class="c1">#> Source Df `Sum Sq` `Mean Sq` `F value` p sigstars `eta sq`</span><span class="w">
</span><span class="c1">#> <fct> <int> <dbl> <dbl> <dbl> <dbl> <chr> <dbl></span><span class="w">
</span><span class="c1">#> 1 am 1 36.8 36.8 4.00 0.0560 . 0.0330</span><span class="w">
</span><span class="c1">#> 2 cyl 2 456. 228. 24.8 0. *** 0.405 </span><span class="w">
</span><span class="c1">#> 3 am:cyl 2 25.4 12.7 1.40 0.269 "" 0.0230</span><span class="w">
</span><span class="c1">#> 4 Residuals 26 239. 9.19 NA NA <NA> 0.212</span><span class="w">
</span><span class="c1">#> </span><span class="w">
</span><span class="c1">#> Table of group means</span><span class="w">
</span><span class="c1">#> # A tibble: 6 x 9</span><span class="w">
</span><span class="c1">#> # Groups: am [2]</span><span class="w">
</span><span class="c1">#> am cyl TheMean TheSD TheSEM CIMuliplier LowerBound UpperBound N</span><span class="w">
</span><span class="c1">#> <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int></span><span class="w">
</span><span class="c1">#> 1 0 4 22.9 1.45 0.839 4.30 19.3 26.5 3</span><span class="w">
</span><span class="c1">#> 2 0 6 19.1 1.63 0.816 3.18 16.5 21.7 4</span><span class="w">
</span><span class="c1">#> 3 0 8 15.0 2.77 0.801 2.20 13.3 16.8 12</span><span class="w">
</span><span class="c1">#> 4 1 4 28.1 4.48 1.59 2.36 24.3 31.8 8</span><span class="w">
</span><span class="c1">#> 5 1 6 20.6 0.751 0.433 4.30 18.7 22.4 3</span><span class="w">
</span><span class="c1">#> 6 1 8 15.4 0.566 0.400 12.7 10.3 20.5 2</span><span class="w">
</span><span class="c1">#> </span><span class="w">
</span><span class="c1">#> Testing Homogeneity of Variance with Brown-Forsythe</span><span class="w">
</span><span class="c1">#> *** Possible violation of the assumption ***</span><span class="w">
</span><span class="c1">#> Levene's Test for Homogeneity of Variance (center = median)</span><span class="w">
</span><span class="c1">#> Df F value Pr(>F) </span><span class="w">
</span><span class="c1">#> group 5 2.736 0.04086 *</span><span class="w">
</span><span class="c1">#> 26 </span><span class="w">
</span><span class="c1">#> ---</span><span class="w">
</span><span class="c1">#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1</span><span class="w">
</span><span class="c1">#> </span><span class="w">
</span><span class="c1">#> Testing Normality Assumption with Shapiro-Wilk</span><span class="w">
</span><span class="c1">#> </span><span class="w">
</span><span class="c1">#> Shapiro-Wilk normality test</span><span class="w">
</span><span class="c1">#> </span><span class="w">
</span><span class="c1">#> data: MyAOV_residuals</span><span class="w">
</span><span class="c1">#> W = 0.96277, p-value = 0.3263</span><span class="w">
</span><span class="c1">#> </span><span class="w">
</span><span class="c1">#> Interaction graph plotted...</span><span class="w">
</span>
```

`OurConf`

is a simulation function that helps you learn about confidence

intervals

```
<span class="n">OurConf</span><span class="p">(</span><span class="n">samples</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">20</span><span class="p">,</span><span class="w"> </span><span class="n">n</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">15</span><span class="p">,</span><span class="w"> </span><span class="n">mu</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">100</span><span class="p">,</span><span class="w"> </span><span class="n">sigma</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">20</span><span class="p">,</span><span class="w"> </span><span class="n">conf.level</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0.90</span><span class="p">)</span><span class="w">
</span>
```

`#> 100 % of the confidence intervals contain Mu = 100 .`

## Credits

Many thanks to Dani Navarro and the book > (Learning Statistics with

R)

whose etaSquared function was the genesis of `neweta`

.

“He who gives up safety for speed deserves neither.”

(via)

#### A shoutout to some other packages I find essential.

- stringr, for strings.
- lubridate, for date/times.
- forcats, for factors.
- haven, for SPSS, SAS and Stata

files. - readxl, for
`.xls`

and`.xlsx`

files. - modelr, for modelling within a

pipeline - broom, for turning models into

tidy data - ggplot2, for data visualisation.
- dplyr, for data manipulation.
- tidyr, for data tidying.
- readr, for data import.
- purrr, for functional programming.
- tibble, for tibbles, a modern

re-imagining of data frames.

## Leaving Feedback

If you like **CGPfunctions**, please consider leaving feedback

here.

## Contributing

Contributions in the form of feedback, comments, code, and bug reports

are most welcome. How to contribute:

- Issues, bug reports, and wish lists: File a GitHub

issue. - Contact the maintainer ibecav at gmail.com by email.

### License

This work (blogpost) is licensed under a

Creative

Commons Attribution-ShareAlike 4.0 International License.

**leave a comment**for the author, please follow the link and comment on their blog:

**Chuck Powell**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.