Introducing the CGPfunctions package – March 22, 2018
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Overview
This package includes functions that I find useful for teaching
statistics as well as actually practicing the art. They typically are
not “new” methods but rather wrappers around either base R or other
packages and concepts I’m trying to master. Currently contains:
Plot2WayANOVA
which as the name implies conducts a 2 way ANOVA and
plots the results usingggplot2
neweta
which is a helper function that appends the results of a
Type II eta squared calculation onto a classic ANOVA tableMode
which finds the modal value in a vector of dataSeeDist
which wraps around ggplot2 to provide visualizations of
univariate data.OurConf
is a simulation function that helps you learn about
confidence intervals
Installation
<span class="c1"># Install from CRAN</span><span class="w">
</span><span class="n">install.packages</span><span class="p">(</span><span class="s2">"CGPfunctions"</span><span class="p">)</span><span class="w">
</span><span class="c1"># Highly recommended since it is under rapid development right now</span><span class="w">
</span><span class="c1"># Or the development version from GitHub</span><span class="w">
</span><span class="c1"># install.packages("devtools")</span><span class="w">
</span><span class="n">devtools</span><span class="o">::</span><span class="n">install_github</span><span class="p">(</span><span class="s2">"ibecav/CGPfunctions"</span><span class="p">)</span><span class="w">
</span>
Usage
library(CGPfunctions)
will load the package which contains 5
functions:
SeeDist
will give you some plots of the distribution of a variable
using ggplot2
<span class="n">library</span><span class="p">(</span><span class="n">CGPfunctions</span><span class="p">)</span><span class="w">
</span><span class="n">SeeDist</span><span class="p">(</span><span class="n">mtcars</span><span class="o">$</span><span class="n">hp</span><span class="p">,</span><span class="n">whatvar</span><span class="o">=</span><span class="s2">"Horsepower"</span><span class="p">,</span><span class="n">whatplots</span><span class="o">=</span><span class="s2">"d"</span><span class="p">)</span><span class="w">
</span>
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 52.0 96.5 123.0 146.7 180.0 335.0
Mode
is a helper function that simply returns one or more modal values
<span class="n">Mode</span><span class="p">(</span><span class="n">mtcars</span><span class="o">$</span><span class="n">hp</span><span class="p">)</span><span class="w">
</span><span class="c1">#> [1] 110 175 180</span><span class="w">
</span>
neweta
is a helper function which returns a tibble containing AOV
output similar to summary(aov(MyAOV)) but with eta squared computed and
appended as an additional column
<span class="n">MyAOV</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">aov</span><span class="p">(</span><span class="n">mpg</span><span class="o">~</span><span class="n">am</span><span class="o">*</span><span class="n">cyl</span><span class="p">,</span><span class="w"> </span><span class="n">mtcars</span><span class="p">)</span><span class="w">
</span><span class="n">neweta</span><span class="p">(</span><span class="n">MyAOV</span><span class="p">)</span><span class="w">
</span><span class="c1">#> # A tibble: 4 x 8</span><span class="w">
</span><span class="c1">#> Source Df `Sum Sq` `Mean Sq` `F value` p sigstars `eta sq`</span><span class="w">
</span><span class="c1">#> <fct> <int> <dbl> <dbl> <dbl> <dbl> <chr> <dbl></span><span class="w">
</span><span class="c1">#> 1 am 1 37.0 37.0 4.30 0.0480 * 0.0330</span><span class="w">
</span><span class="c1">#> 2 cyl 1 450. 450. 52.0 0. *** 0.399 </span><span class="w">
</span><span class="c1">#> 3 am:cyl 1 29.4 29.4 3.40 0.0760 . 0.0260</span><span class="w">
</span><span class="c1">#> 4 Residuals 28 242. 8.64 NA NA <NA> 0.215</span><span class="w">
</span>
The Plot2WayANOVA
function conducts a classic analysis using existing
R functions and packages in a sane and defensible manner not necessarily
in the one and only manner.
<span class="n">Plot2WayANOVA</span><span class="p">(</span><span class="n">mpg</span><span class="o">~</span><span class="n">am</span><span class="o">*</span><span class="n">cyl</span><span class="p">,</span><span class="w"> </span><span class="n">mtcars</span><span class="p">)</span><span class="w">
</span><span class="c1">#> </span><span class="w">
</span><span class="c1">#> Converting am to a factor --- check your results</span><span class="w">
</span><span class="c1">#> </span><span class="w">
</span><span class="c1">#> Converting cyl to a factor --- check your results</span><span class="w">
</span><span class="c1">#> </span><span class="w">
</span><span class="c1">#> You have an unbalanced design. Using Type II sum of squares, eta squared may not sum to 1.0</span><span class="w">
</span><span class="c1">#> # A tibble: 4 x 8</span><span class="w">
</span><span class="c1">#> Source Df `Sum Sq` `Mean Sq` `F value` p sigstars `eta sq`</span><span class="w">
</span><span class="c1">#> <fct> <int> <dbl> <dbl> <dbl> <dbl> <chr> <dbl></span><span class="w">
</span><span class="c1">#> 1 am 1 36.8 36.8 4.00 0.0560 . 0.0330</span><span class="w">
</span><span class="c1">#> 2 cyl 2 456. 228. 24.8 0. *** 0.405 </span><span class="w">
</span><span class="c1">#> 3 am:cyl 2 25.4 12.7 1.40 0.269 "" 0.0230</span><span class="w">
</span><span class="c1">#> 4 Residuals 26 239. 9.19 NA NA <NA> 0.212</span><span class="w">
</span><span class="c1">#> </span><span class="w">
</span><span class="c1">#> Table of group means</span><span class="w">
</span><span class="c1">#> # A tibble: 6 x 9</span><span class="w">
</span><span class="c1">#> # Groups: am [2]</span><span class="w">
</span><span class="c1">#> am cyl TheMean TheSD TheSEM CIMuliplier LowerBound UpperBound N</span><span class="w">
</span><span class="c1">#> <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int></span><span class="w">
</span><span class="c1">#> 1 0 4 22.9 1.45 0.839 4.30 19.3 26.5 3</span><span class="w">
</span><span class="c1">#> 2 0 6 19.1 1.63 0.816 3.18 16.5 21.7 4</span><span class="w">
</span><span class="c1">#> 3 0 8 15.0 2.77 0.801 2.20 13.3 16.8 12</span><span class="w">
</span><span class="c1">#> 4 1 4 28.1 4.48 1.59 2.36 24.3 31.8 8</span><span class="w">
</span><span class="c1">#> 5 1 6 20.6 0.751 0.433 4.30 18.7 22.4 3</span><span class="w">
</span><span class="c1">#> 6 1 8 15.4 0.566 0.400 12.7 10.3 20.5 2</span><span class="w">
</span><span class="c1">#> </span><span class="w">
</span><span class="c1">#> Testing Homogeneity of Variance with Brown-Forsythe</span><span class="w">
</span><span class="c1">#> *** Possible violation of the assumption ***</span><span class="w">
</span><span class="c1">#> Levene's Test for Homogeneity of Variance (center = median)</span><span class="w">
</span><span class="c1">#> Df F value Pr(>F) </span><span class="w">
</span><span class="c1">#> group 5 2.736 0.04086 *</span><span class="w">
</span><span class="c1">#> 26 </span><span class="w">
</span><span class="c1">#> ---</span><span class="w">
</span><span class="c1">#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1</span><span class="w">
</span><span class="c1">#> </span><span class="w">
</span><span class="c1">#> Testing Normality Assumption with Shapiro-Wilk</span><span class="w">
</span><span class="c1">#> </span><span class="w">
</span><span class="c1">#> Shapiro-Wilk normality test</span><span class="w">
</span><span class="c1">#> </span><span class="w">
</span><span class="c1">#> data: MyAOV_residuals</span><span class="w">
</span><span class="c1">#> W = 0.96277, p-value = 0.3263</span><span class="w">
</span><span class="c1">#> </span><span class="w">
</span><span class="c1">#> Interaction graph plotted...</span><span class="w">
</span>
OurConf
is a simulation function that helps you learn about confidence
intervals
<span class="n">OurConf</span><span class="p">(</span><span class="n">samples</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">20</span><span class="p">,</span><span class="w"> </span><span class="n">n</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">15</span><span class="p">,</span><span class="w"> </span><span class="n">mu</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">100</span><span class="p">,</span><span class="w"> </span><span class="n">sigma</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">20</span><span class="p">,</span><span class="w"> </span><span class="n">conf.level</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0.90</span><span class="p">)</span><span class="w">
</span>
#> 100 % of the confidence intervals contain Mu = 100 .
Credits
Many thanks to Dani Navarro and the book > (Learning Statistics with
R)
whose etaSquared function was the genesis of neweta
.
“He who gives up safety for speed deserves neither.”
(via)
A shoutout to some other packages I find essential.
- stringr, for strings.
- lubridate, for date/times.
- forcats, for factors.
- haven, for SPSS, SAS and Stata
files. - readxl, for
.xls
and.xlsx
files. - modelr, for modelling within a
pipeline - broom, for turning models into
tidy data - ggplot2, for data visualisation.
- dplyr, for data manipulation.
- tidyr, for data tidying.
- readr, for data import.
- purrr, for functional programming.
- tibble, for tibbles, a modern
re-imagining of data frames.
Leaving Feedback
If you like CGPfunctions, please consider leaving feedback
here.
Contributing
Contributions in the form of feedback, comments, code, and bug reports
are most welcome. How to contribute:
- Issues, bug reports, and wish lists: File a GitHub
issue. - Contact the maintainer ibecav at gmail.com by email.
License
This work (blogpost) is licensed under a
Creative
Commons Attribution-ShareAlike 4.0 International License.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.