Introducing the CGPfunctions package

[This article was first published on Chuck Powell, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

CRAN
Version
RBloggers

Overview

This package includes functions that I find useful for teaching
statistics as well as actually practicing the art. They typically are
not “new” methods but rather wrappers around either base R or other
packages and concepts I’m trying to master. Currently contains:

  • Plot2WayANOVA which as the name implies conducts a 2 way ANOVA and
    plots the results using ggplot2
  • neweta which is a helper function that appends the results of a
    Type II eta squared calculation onto a classic ANOVA table
  • Mode which finds the modal value in a vector of data
  • SeeDist which wraps around ggplot2 to provide visualizations of
    univariate data.
  • OurConf is a simulation function that helps you learn about
    confidence intervals

Installation

<span class="c1"># Install from CRAN</span><span class="w">
</span><span class="n">install.packages</span><span class="p">(</span><span class="s2">"CGPfunctions"</span><span class="p">)</span><span class="w">

</span><span class="c1"># Highly recommended since it is under rapid development right now</span><span class="w">
</span><span class="c1"># Or the development version from GitHub</span><span class="w">
</span><span class="c1"># install.packages("devtools")</span><span class="w">
</span><span class="n">devtools</span><span class="o">::</span><span class="n">install_github</span><span class="p">(</span><span class="s2">"ibecav/CGPfunctions"</span><span class="p">)</span><span class="w">
</span>

Usage

library(CGPfunctions) will load the package which contains 5
functions:

SeeDist will give you some plots of the distribution of a variable
using ggplot2

<span class="n">library</span><span class="p">(</span><span class="n">CGPfunctions</span><span class="p">)</span><span class="w">
</span><span class="n">SeeDist</span><span class="p">(</span><span class="n">mtcars</span><span class="o">$</span><span class="n">hp</span><span class="p">,</span><span class="n">whatvar</span><span class="o">=</span><span class="s2">"Horsepower"</span><span class="p">,</span><span class="n">whatplots</span><span class="o">=</span><span class="s2">"d"</span><span class="p">)</span><span class="w">
</span>

#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>    52.0    96.5   123.0   146.7   180.0   335.0

Mode is a helper function that simply returns one or more modal values

<span class="n">Mode</span><span class="p">(</span><span class="n">mtcars</span><span class="o">$</span><span class="n">hp</span><span class="p">)</span><span class="w">
</span><span class="c1">#> [1] 110 175 180</span><span class="w">
</span>

neweta is a helper function which returns a tibble containing AOV
output similar to summary(aov(MyAOV)) but with eta squared computed and
appended as an additional column

<span class="n">MyAOV</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">aov</span><span class="p">(</span><span class="n">mpg</span><span class="o">~</span><span class="n">am</span><span class="o">*</span><span class="n">cyl</span><span class="p">,</span><span class="w"> </span><span class="n">mtcars</span><span class="p">)</span><span class="w">
</span><span class="n">neweta</span><span class="p">(</span><span class="n">MyAOV</span><span class="p">)</span><span class="w">
</span><span class="c1">#> # A tibble: 4 x 8</span><span class="w">
</span><span class="c1">#>   Source       Df `Sum Sq` `Mean Sq` `F value`       p sigstars `eta sq`</span><span class="w">
</span><span class="c1">#>   <fct>     <int>    <dbl>     <dbl>     <dbl>   <dbl> <chr>       <dbl></span><span class="w">
</span><span class="c1">#> 1 am            1     37.0     37.0       4.30  0.0480 *          0.0330</span><span class="w">
</span><span class="c1">#> 2 cyl           1    450.     450.       52.0   0.     ***        0.399 </span><span class="w">
</span><span class="c1">#> 3 am:cyl        1     29.4     29.4       3.40  0.0760 .          0.0260</span><span class="w">
</span><span class="c1">#> 4 Residuals    28    242.       8.64     NA    NA      <NA>       0.215</span><span class="w">
</span>

The Plot2WayANOVA function conducts a classic analysis using existing
R functions and packages in a sane and defensible manner not necessarily
in the one and only manner.

<span class="n">Plot2WayANOVA</span><span class="p">(</span><span class="n">mpg</span><span class="o">~</span><span class="n">am</span><span class="o">*</span><span class="n">cyl</span><span class="p">,</span><span class="w"> </span><span class="n">mtcars</span><span class="p">)</span><span class="w">
</span><span class="c1">#> </span><span class="w">
</span><span class="c1">#> Converting am to a factor --- check your results</span><span class="w">
</span><span class="c1">#> </span><span class="w">
</span><span class="c1">#> Converting cyl to a factor --- check your results</span><span class="w">
</span><span class="c1">#> </span><span class="w">
</span><span class="c1">#> You have an unbalanced design. Using Type II sum of squares, eta squared may not sum to 1.0</span><span class="w">
</span><span class="c1">#> # A tibble: 4 x 8</span><span class="w">
</span><span class="c1">#>   Source       Df `Sum Sq` `Mean Sq` `F value`       p sigstars `eta sq`</span><span class="w">
</span><span class="c1">#>   <fct>     <int>    <dbl>     <dbl>     <dbl>   <dbl> <chr>       <dbl></span><span class="w">
</span><span class="c1">#> 1 am            1     36.8     36.8       4.00  0.0560 .          0.0330</span><span class="w">
</span><span class="c1">#> 2 cyl           2    456.     228.       24.8   0.     ***        0.405 </span><span class="w">
</span><span class="c1">#> 3 am:cyl        2     25.4     12.7       1.40  0.269  ""         0.0230</span><span class="w">
</span><span class="c1">#> 4 Residuals    26    239.       9.19     NA    NA      <NA>       0.212</span><span class="w">
</span><span class="c1">#> </span><span class="w">
</span><span class="c1">#> Table of group means</span><span class="w">
</span><span class="c1">#> # A tibble: 6 x 9</span><span class="w">
</span><span class="c1">#> # Groups:   am [2]</span><span class="w">
</span><span class="c1">#>   am    cyl   TheMean TheSD TheSEM CIMuliplier LowerBound UpperBound     N</span><span class="w">
</span><span class="c1">#>   <fct> <fct>   <dbl> <dbl>  <dbl>       <dbl>      <dbl>      <dbl> <int></span><span class="w">
</span><span class="c1">#> 1 0     4        22.9 1.45   0.839        4.30       19.3       26.5     3</span><span class="w">
</span><span class="c1">#> 2 0     6        19.1 1.63   0.816        3.18       16.5       21.7     4</span><span class="w">
</span><span class="c1">#> 3 0     8        15.0 2.77   0.801        2.20       13.3       16.8    12</span><span class="w">
</span><span class="c1">#> 4 1     4        28.1 4.48   1.59         2.36       24.3       31.8     8</span><span class="w">
</span><span class="c1">#> 5 1     6        20.6 0.751  0.433        4.30       18.7       22.4     3</span><span class="w">
</span><span class="c1">#> 6 1     8        15.4 0.566  0.400       12.7        10.3       20.5     2</span><span class="w">
</span><span class="c1">#> </span><span class="w">
</span><span class="c1">#> Testing Homogeneity of Variance with Brown-Forsythe</span><span class="w">
</span><span class="c1">#>    *** Possible violation of the assumption ***</span><span class="w">
</span><span class="c1">#> Levene's Test for Homogeneity of Variance (center = median)</span><span class="w">
</span><span class="c1">#>       Df F value  Pr(>F)  </span><span class="w">
</span><span class="c1">#> group  5   2.736 0.04086 *</span><span class="w">
</span><span class="c1">#>       26                  </span><span class="w">
</span><span class="c1">#> ---</span><span class="w">
</span><span class="c1">#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1</span><span class="w">
</span><span class="c1">#> </span><span class="w">
</span><span class="c1">#> Testing Normality Assumption with Shapiro-Wilk</span><span class="w">
</span><span class="c1">#> </span><span class="w">
</span><span class="c1">#>  Shapiro-Wilk normality test</span><span class="w">
</span><span class="c1">#> </span><span class="w">
</span><span class="c1">#> data:  MyAOV_residuals</span><span class="w">
</span><span class="c1">#> W = 0.96277, p-value = 0.3263</span><span class="w">
</span><span class="c1">#> </span><span class="w">
</span><span class="c1">#> Interaction graph plotted...</span><span class="w">
</span>

OurConf is a simulation function that helps you learn about confidence
intervals

<span class="n">OurConf</span><span class="p">(</span><span class="n">samples</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">20</span><span class="p">,</span><span class="w"> </span><span class="n">n</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">15</span><span class="p">,</span><span class="w"> </span><span class="n">mu</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">100</span><span class="p">,</span><span class="w"> </span><span class="n">sigma</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">20</span><span class="p">,</span><span class="w"> </span><span class="n">conf.level</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0.90</span><span class="p">)</span><span class="w">
</span>

#> 100 % of the confidence intervals contain Mu = 100 .

Credits

Many thanks to Dani Navarro and the book > (Learning Statistics with
R
)
whose etaSquared function was the genesis of neweta.

“He who gives up safety for speed deserves neither.”
(via)

A shoutout to some other packages I find essential.

  • stringr, for strings.
  • lubridate, for date/times.
  • forcats, for factors.
  • haven, for SPSS, SAS and Stata
    files.
  • readxl, for .xls and .xlsx
    files.
  • modelr, for modelling within a
    pipeline
  • broom, for turning models into
    tidy data
  • ggplot2, for data visualisation.
  • dplyr, for data manipulation.
  • tidyr, for data tidying.
  • readr, for data import.
  • purrr, for functional programming.
  • tibble, for tibbles, a modern
    re-imagining of data frames.

Leaving Feedback

If you like CGPfunctions, please consider leaving feedback
here
.

Contributing

Contributions in the form of feedback, comments, code, and bug reports
are most welcome. How to contribute:

  • Issues, bug reports, and wish lists: File a GitHub
    issue
    .
  • Contact the maintainer ibecav at gmail.com by email.

License

Creative Commons License
This work (blogpost) is licensed under a
Creative
Commons Attribution-ShareAlike 4.0 International License
.

To leave a comment for the author, please follow the link and comment on their blog: Chuck Powell.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)