Descriptive/Summary Statistics with descriptr

[This article was first published on Rsquared Academy Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

We are pleased to introduce the descriptr package, a set of tools for generating descriptive/summary statistics.

Installation

# Install release version from CRAN
install.packages("descriptr")

# Install development version from GitHub
# install.packages("devtools")
devtools::install_github("rsquaredacademy/descriptr")

Shiny App

descriptr includes a shiny app which can be launched using

ds_launch_shiny_app()

or try the live version here.

Read on to learn more about the features of descriptr, or see the descriptr website for detailed documentation on using the package.

Data

We have modified the mtcars data to create a new data set mtcarz. The only difference between the two data sets is related to the variable types.

str(mtcarz)
## 'data.frame':    32 obs. of  11 variables:
##  $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
##  $ cyl : Factor w/ 3 levels "4","6","8": 2 2 1 2 3 2 3 1 1 2 ...
##  $ disp: num  160 160 108 258 360 ...
##  $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
##  $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
##  $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
##  $ qsec: num  16.5 17 18.6 19.4 17 ...
##  $ vs  : Factor w/ 2 levels "0","1": 1 1 2 2 1 2 1 2 2 2 ...
##  $ am  : Factor w/ 2 levels "0","1": 2 2 2 1 1 1 1 1 1 1 ...
##  $ gear: Factor w/ 3 levels "3","4","5": 2 2 2 1 1 1 1 2 2 2 ...
##  $ carb: Factor w/ 6 levels "1","2","3","4",..: 4 4 1 1 2 1 4 2 2 4 ...

Data Screening

The ds_screener() function will screen a data set and return the following: – Column/Variable Names – Data Type – Levels (in case of categorical data) – Number of missing observations – % of missing observations

ds_screener(mtcarz)
## -----------------------------------------------------------------------
## |  Column Name  |  Data Type  |  Levels   |  Missing  |  Missing (%)  |
## -----------------------------------------------------------------------
## |      mpg      |   numeric   |    NA     |     0     |       0       |
## |      cyl      |   factor    |   4 6 8   |     0     |       0       |
## |     disp      |   numeric   |    NA     |     0     |       0       |
## |      hp       |   numeric   |    NA     |     0     |       0       |
## |     drat      |   numeric   |    NA     |     0     |       0       |
## |      wt       |   numeric   |    NA     |     0     |       0       |
## |     qsec      |   numeric   |    NA     |     0     |       0       |
## |      vs       |   factor    |    0 1    |     0     |       0       |
## |      am       |   factor    |    0 1    |     0     |       0       |
## |     gear      |   factor    |   3 4 5   |     0     |       0       |
## |     carb      |   factor    |1 2 3 4 6 8|     0     |       0       |
## -----------------------------------------------------------------------
## 
##  Overall Missing Values           0 
##  Percentage of Missing Values     0 %
##  Rows with Missing Values         0 
##  Columns With Missing Values      0

Continuous Data

Summary Statistics

The ds_summary_stats() function returns a comprehensive set of statistics including measures of location, variation, symmetry and extreme observations.

ds_summary_stats(mtcarz, mpg)
## ------------------------------ Variable: mpg ------------------------------
## 
##                         Univariate Analysis                          
## 
##  N                       32.00      Variance                36.32 
##  Missing                  0.00      Std Deviation            6.03 
##  Mean                    20.09      Range                   23.50 
##  Median                  19.20      Interquartile Range      7.38 
##  Mode                    10.40      Uncorrected SS       14042.31 
##  Trimmed Mean            19.95      Corrected SS          1126.05 
##  Skewness                 0.67      Coeff Variation         30.00 
##  Kurtosis                -0.02      Std Error Mean           1.07 
## 
##                               Quantiles                               
## 
##               Quantile                            Value                
## 
##              Max                                  33.90                
##              99%                                  33.44                
##              95%                                  31.30                
##              90%                                  30.09                
##              Q3                                   22.80                
##              Median                               19.20                
##              Q1                                   15.43                
##              10%                                  14.34                
##              5%                                   12.00                
##              1%                                   10.40                
##              Min                                  10.40                
## 
##                             Extreme Values                            
## 
##                 Low                                High                
## 
##   Obs                        Value       Obs                        Value 
##   15                         10.4        20                         33.9  
##   16                         10.4        18                         32.4  
##   24                         13.3        19                         30.4  
##    7                         14.3        28                         30.4  
##   17                         14.7        26                         27.3

You can pass multiple variables as shown below:

ds_summary_stats(mtcarz, mpg, disp)
## ------------------------------ Variable: mpg ------------------------------
## 
##                         Univariate Analysis                          
## 
##  N                       32.00      Variance                36.32 
##  Missing                  0.00      Std Deviation            6.03 
##  Mean                    20.09      Range                   23.50 
##  Median                  19.20      Interquartile Range      7.38 
##  Mode                    10.40      Uncorrected SS       14042.31 
##  Trimmed Mean            19.95      Corrected SS          1126.05 
##  Skewness                 0.67      Coeff Variation         30.00 
##  Kurtosis                -0.02      Std Error Mean           1.07 
## 
##                               Quantiles                               
## 
##               Quantile                            Value                
## 
##              Max                                  33.90                
##              99%                                  33.44                
##              95%                                  31.30                
##              90%                                  30.09                
##              Q3                                   22.80                
##              Median                               19.20                
##              Q1                                   15.43                
##              10%                                  14.34                
##              5%                                   12.00                
##              1%                                   10.40                
##              Min                                  10.40                
## 
##                             Extreme Values                            
## 
##                 Low                                High                
## 
##   Obs                        Value       Obs                        Value 
##   15                         10.4        20                         33.9  
##   16                         10.4        18                         32.4  
##   24                         13.3        19                         30.4  
##    7                         14.3        28                         30.4  
##   17                         14.7        26                         27.3  
## 
## 
## 
## ------------------------------ Variable: disp -----------------------------
## 
##                           Univariate Analysis                            
## 
##  N                         32.00      Variance               15360.80 
##  Missing                    0.00      Std Deviation            123.94 
##  Mean                     230.72      Range                    400.90 
##  Median                   196.30      Interquartile Range      205.18 
##  Mode                     275.80      Uncorrected SS       2179627.47 
##  Trimmed Mean             228.00      Corrected SS          476184.79 
##  Skewness                   0.42      Coeff Variation           53.72 
##  Kurtosis                  -1.07      Std Error Mean            21.91 
## 
##                                 Quantiles                                 
## 
##                Quantile                              Value                 
## 
##               Max                                    472.00                
##               99%                                    468.28                
##               95%                                    449.00                
##               90%                                    396.00                
##               Q3                                     326.00                
##               Median                                 196.30                
##               Q1                                     120.83                
##               10%                                    80.61                 
##               5%                                     77.35                 
##               1%                                     72.53                 
##               Min                                    71.10                 
## 
##                               Extreme Values                              
## 
##                  Low                                  High                 
## 
##   Obs                          Value       Obs                          Value 
##   20                           71.1        15                            472  
##   19                           75.7        16                            460  
##   18                           78.7        17                            440  
##   26                            79         25                            400  
##   28                           95.1         5                            360

If you do not specify any variables, it will detect all the continuous variables in the data set and return summary statistics for each of them.

Frequency Distribution

The ds_freq_table() function creates frequency tables for continuous variables. The default number of intervals is 5.

ds_freq_table(mtcarz, mpg, 4)
##                                 Variable: mpg                                 
## |---------------------------------------------------------------------------|
## |      Bins       | Frequency | Cum Frequency |   Percent    | Cum Percent  |
## |---------------------------------------------------------------------------|
## |  10.4  -  16.3  |    10     |      10       |    31.25     |    31.25     |
## |---------------------------------------------------------------------------|
## |  16.3  -  22.1  |    13     |      23       |    40.62     |    71.88     |
## |---------------------------------------------------------------------------|
## |  22.1  -   28   |     5     |      28       |    15.62     |     87.5     |
## |---------------------------------------------------------------------------|
## |   28   -  33.9  |     4     |      32       |     12.5     |     100      |
## |---------------------------------------------------------------------------|
## |      Total      |    32     |       -       |    100.00    |      -       |
## |---------------------------------------------------------------------------|

Histogram

A plot() method has been defined which will generate a histogram.

k <- ds_freq_table(mtcarz, mpg, 4)
plot(k)

Auto Summary

If you want to view summary statistics and frequency tables of all or subset of variables in a data set, use ds_auto_summary().

ds_auto_summary_stats(mtcarz, disp, mpg)
## ------------------------------ Variable: disp -----------------------------
## 
## ---------------------------- Summary Statistics ---------------------------
## 
## ------------------------------ Variable: disp -----------------------------
## 
##                           Univariate Analysis                            
## 
##  N                         32.00      Variance               15360.80 
##  Missing                    0.00      Std Deviation            123.94 
##  Mean                     230.72      Range                    400.90 
##  Median                   196.30      Interquartile Range      205.18 
##  Mode                     275.80      Uncorrected SS       2179627.47 
##  Trimmed Mean             228.00      Corrected SS          476184.79 
##  Skewness                   0.42      Coeff Variation           53.72 
##  Kurtosis                  -1.07      Std Error Mean            21.91 
## 
##                                 Quantiles                                 
## 
##                Quantile                              Value                 
## 
##               Max                                    472.00                
##               99%                                    468.28                
##               95%                                    449.00                
##               90%                                    396.00                
##               Q3                                     326.00                
##               Median                                 196.30                
##               Q1                                     120.83                
##               10%                                    80.61                 
##               5%                                     77.35                 
##               1%                                     72.53                 
##               Min                                    71.10                 
## 
##                               Extreme Values                              
## 
##                  Low                                  High                 
## 
##   Obs                          Value       Obs                          Value 
##   20                           71.1        15                            472  
##   19                           75.7        16                            460  
##   18                           78.7        17                            440  
##   26                            79         25                            400  
##   28                           95.1         5                            360  
## 
## 
## 
## NULL
## 
## 
## -------------------------- Frequency Distribution -------------------------
## 
##                                Variable: disp                                 
## |---------------------------------------------------------------------------|
## |      Bins       | Frequency | Cum Frequency |   Percent    | Cum Percent  |
## |---------------------------------------------------------------------------|
## |  71.1  - 151.3  |    12     |      12       |     37.5     |     37.5     |
## |---------------------------------------------------------------------------|
## | 151.3  - 231.5  |     5     |      17       |    15.62     |    53.12     |
## |---------------------------------------------------------------------------|
## | 231.5  - 311.6  |     6     |      23       |    18.75     |    71.88     |
## |---------------------------------------------------------------------------|
## | 311.6  - 391.8  |     5     |      28       |    15.62     |     87.5     |
## |---------------------------------------------------------------------------|
## | 391.8  -  472   |     4     |      32       |     12.5     |     100      |
## |---------------------------------------------------------------------------|
## |      Total      |    32     |       -       |    100.00    |      -       |
## |---------------------------------------------------------------------------|
## 
## 
## ------------------------------ Variable: mpg ------------------------------
## 
## ---------------------------- Summary Statistics ---------------------------
## 
## ------------------------------ Variable: mpg ------------------------------
## 
##                         Univariate Analysis                          
## 
##  N                       32.00      Variance                36.32 
##  Missing                  0.00      Std Deviation            6.03 
##  Mean                    20.09      Range                   23.50 
##  Median                  19.20      Interquartile Range      7.38 
##  Mode                    10.40      Uncorrected SS       14042.31 
##  Trimmed Mean            19.95      Corrected SS          1126.05 
##  Skewness                 0.67      Coeff Variation         30.00 
##  Kurtosis                -0.02      Std Error Mean           1.07 
## 
##                               Quantiles                               
## 
##               Quantile                            Value                
## 
##              Max                                  33.90                
##              99%                                  33.44                
##              95%                                  31.30                
##              90%                                  30.09                
##              Q3                                   22.80                
##              Median                               19.20                
##              Q1                                   15.43                
##              10%                                  14.34                
##              5%                                   12.00                
##              1%                                   10.40                
##              Min                                  10.40                
## 
##                             Extreme Values                            
## 
##                 Low                                High                
## 
##   Obs                        Value       Obs                        Value 
##   15                         10.4        20                         33.9  
##   16                         10.4        18                         32.4  
##   24                         13.3        19                         30.4  
##    7                         14.3        28                         30.4  
##   17                         14.7        26                         27.3  
## 
## 
## 
## NULL
## 
## 
## -------------------------- Frequency Distribution -------------------------
## 
##                               Variable: mpg                               
## |-----------------------------------------------------------------------|
## |    Bins     | Frequency | Cum Frequency |   Percent    | Cum Percent  |
## |-----------------------------------------------------------------------|
## | 10.4 - 15.1 |     6     |       6       |    18.75     |    18.75     |
## |-----------------------------------------------------------------------|
## | 15.1 - 19.8 |    12     |      18       |     37.5     |    56.25     |
## |-----------------------------------------------------------------------|
## | 19.8 - 24.5 |     8     |      26       |      25      |    81.25     |
## |-----------------------------------------------------------------------|
## | 24.5 - 29.2 |     2     |      28       |     6.25     |     87.5     |
## |-----------------------------------------------------------------------|
## | 29.2 - 33.9 |     4     |      32       |     12.5     |     100      |
## |-----------------------------------------------------------------------|
## |    Total    |    32     |       -       |    100.00    |      -       |
## |-----------------------------------------------------------------------|

Group Summary

The ds_group_summary() function returns descriptive statistics of a continuous variable for the different levels of a categorical variable.

k <- ds_group_summary(mtcarz, cyl, mpg)
k
##                                        mpg by cyl                                         
## -----------------------------------------------------------------------------------------
## |     Statistic/Levels|                    4|                    6|                    8|
## -----------------------------------------------------------------------------------------
## |                  Obs|                   11|                    7|                   14|
## |              Minimum|                 21.4|                 17.8|                 10.4|
## |              Maximum|                 33.9|                 21.4|                 19.2|
## |                 Mean|                26.66|                19.74|                 15.1|
## |               Median|                   26|                 19.7|                 15.2|
## |                 Mode|                 22.8|                   21|                 10.4|
## |       Std. Deviation|                 4.51|                 1.45|                 2.56|
## |             Variance|                20.34|                 2.11|                 6.55|
## |             Skewness|                 0.35|                -0.26|                -0.46|
## |             Kurtosis|                -1.43|                -1.83|                 0.33|
## |       Uncorrected SS|              8023.83|              2741.14|              3277.34|
## |         Corrected SS|               203.39|                12.68|                 85.2|
## |      Coeff Variation|                16.91|                 7.36|                16.95|
## |      Std. Error Mean|                 1.36|                 0.55|                 0.68|
## |                Range|                 12.5|                  3.6|                  8.8|
## |  Interquartile Range|                  7.6|                 2.35|                 1.85|
## -----------------------------------------------------------------------------------------

ds_group_summary() returns a tibble which can be used for further analysis.

k$tidy_stats
## # A tibble: 3 x 15
##   cyl   length   min   max  mean median  mode    sd variance skewness
##   <fct>  <int> <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl>    <dbl>    <dbl>
## 1 4         11  21.4  33.9  26.7   26    22.8  4.51    20.3     0.348
## 2 6          7  17.8  21.4  19.7   19.7  21    1.45     2.11   -0.259
## 3 8         14  10.4  19.2  15.1   15.2  10.4  2.56     6.55   -0.456
## # ... with 5 more variables: kurtosis <dbl>, coeff_var <dbl>,
## #   std_error <dbl>, range <dbl>, iqr <dbl>

Box Plot

A plot() method has been defined for comparing distributions.

k <- ds_group_summary(mtcarz, cyl, mpg)
plot(k)

Multiple Variables

If you want grouped summary statistics for multiple variables in a data set, use ds_auto_group_summary().

ds_auto_group_summary(mtcarz, cyl, gear, mpg)
##                                        mpg by cyl                                         
## -----------------------------------------------------------------------------------------
## |     Statistic/Levels|                    4|                    6|                    8|
## -----------------------------------------------------------------------------------------
## |                  Obs|                   11|                    7|                   14|
## |              Minimum|                 21.4|                 17.8|                 10.4|
## |              Maximum|                 33.9|                 21.4|                 19.2|
## |                 Mean|                26.66|                19.74|                 15.1|
## |               Median|                   26|                 19.7|                 15.2|
## |                 Mode|                 22.8|                   21|                 10.4|
## |       Std. Deviation|                 4.51|                 1.45|                 2.56|
## |             Variance|                20.34|                 2.11|                 6.55|
## |             Skewness|                 0.35|                -0.26|                -0.46|
## |             Kurtosis|                -1.43|                -1.83|                 0.33|
## |       Uncorrected SS|              8023.83|              2741.14|              3277.34|
## |         Corrected SS|               203.39|                12.68|                 85.2|
## |      Coeff Variation|                16.91|                 7.36|                16.95|
## |      Std. Error Mean|                 1.36|                 0.55|                 0.68|
## |                Range|                 12.5|                  3.6|                  8.8|
## |  Interquartile Range|                  7.6|                 2.35|                 1.85|
## -----------------------------------------------------------------------------------------
## 
## 
## 
##                                        mpg by gear                                        
## -----------------------------------------------------------------------------------------
## |     Statistic/Levels|                    3|                    4|                    5|
## -----------------------------------------------------------------------------------------
## |                  Obs|                   15|                   12|                    5|
## |              Minimum|                 10.4|                 17.8|                   15|
## |              Maximum|                 21.5|                 33.9|                 30.4|
## |                 Mean|                16.11|                24.53|                21.38|
## |               Median|                 15.5|                 22.8|                 19.7|
## |                 Mode|                 10.4|                   21|                   15|
## |       Std. Deviation|                 3.37|                 5.28|                 6.66|
## |             Variance|                11.37|                27.84|                44.34|
## |             Skewness|                -0.09|                  0.7|                 0.56|
## |             Kurtosis|                -0.38|                -0.77|                -1.83|
## |       Uncorrected SS|              4050.52|               7528.9|              2462.89|
## |         Corrected SS|               159.15|               306.29|               177.37|
## |      Coeff Variation|                20.93|                21.51|                31.15|
## |      Std. Error Mean|                 0.87|                 1.52|                 2.98|
## |                Range|                 11.1|                 16.1|                 15.4|
## |  Interquartile Range|                  3.9|                 7.08|                 10.2|
## -----------------------------------------------------------------------------------------

Multiple Variable Statistics

The ds_tidy_stats() function returns summary/descriptive statistics for variables in a data frame/tibble.

ds_tidy_stats(mtcarz, mpg, disp, hp)
## # A tibble: 3 x 16
##   vars    min   max  mean t_mean median  mode range variance  stdev  skew
##   <chr> <dbl> <dbl> <dbl>  <dbl>  <dbl> <dbl> <dbl>    <dbl>  <dbl> <dbl>
## 1 disp   71.1 472   231.   228    196.  276.  401.   15361.  124.   0.420
## 2 hp     52   335   147.   144.   123   110   283     4701.   68.6  0.799
## 3 mpg    10.4  33.9  20.1   20.0   19.2  10.4  23.5     36.3   6.03 0.672
## # ... with 5 more variables: kurtosis <dbl>, coeff_var <dbl>, q1 <dbl>,
## #   q3 <dbl>, iqrange <dbl>

Measures

If you want to view the measure of location, variation, symmetry, percentiles and extreme observations as tibbles, use the below functions. All of them, except for ds_extreme_obs() will work with single or multiple variables. If you do not specify the variables, they will return the results for all the continuous variables in the data set.

Measures of Location

ds_measures_location(mtcarz)
## # A tibble: 6 x 5
##   var     mean trim_mean median   mode
##   <chr>  <dbl>     <dbl>  <dbl>  <dbl>
## 1 disp  231.      228    196.   276.  
## 2 drat    3.60      3.58   3.70   3.07
## 3 hp    147.      144.   123    110   
## 4 mpg    20.1      20.0   19.2   10.4 
## 5 qsec   17.8      17.8   17.7   17.0 
## 6 wt      3.22      3.20   3.32   3.44

Measures of Variation

ds_measures_variation(mtcarz)
## # A tibble: 6 x 7
##   var    range     iqr  variance      sd coeff_var std_error
##   <chr>  <dbl>   <dbl>     <dbl>   <dbl>     <dbl>     <dbl>
## 1 disp  401.   205.    15361.    124.         53.7   21.9   
## 2 drat    2.17   0.840     0.286   0.535      14.9    0.0945
## 3 hp    283     83.5    4701.     68.6        46.7   12.1   
## 4 mpg    23.5    7.38     36.3     6.03       30.0    1.07  
## 5 qsec    8.40   2.01      3.19    1.79       10.0    0.316 
## 6 wt      3.91   1.03      0.957   0.978      30.4    0.173

Measures of Symmetry

ds_measures_symmetry(mtcarz)
## # A tibble: 6 x 3
##   var   skewness kurtosis
##   <chr>    <dbl>    <dbl>
## 1 disp     0.420  -1.07  
## 2 drat     0.293  -0.450 
## 3 hp       0.799   0.275 
## 4 mpg      0.672  -0.0220
## 5 qsec     0.406   0.865 
## 6 wt       0.466   0.417

Percentiles

ds_percentiles(mtcarz)
## # A tibble: 6 x 12
##   var     min  per1  per5 per10     q1 median     q3  per95  per90  per99
##   <chr> <dbl> <dbl> <dbl> <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
## 1 disp  71.1  72.5  77.4  80.6  121.   196.   326    449    396.   468.  
## 2 drat   2.76  2.76  2.85  3.01   3.08   3.70   3.92   4.31   4.21   4.78
## 3 hp    52    55.1  63.6  66     96.5  123    180    254.   244.   313.  
## 4 mpg   10.4  10.4  12.0  14.3   15.4   19.2   22.8   31.3   30.1   33.4 
## 5 qsec  14.5  14.5  15.0  15.5   16.9   17.7   18.9   20.1   20.0   22.1 
## 6 wt     1.51  1.54  1.74  1.96   2.58   3.32   3.61   5.29   4.05   5.40
## # ... with 1 more variable: max <dbl>

Categorical Data

Cross Tabulation

The ds_cross_table() function creates two way tables of categorical variables.

ds_cross_table(mtcarz, cyl, gear)
##     Cell Contents
##  |---------------|
##  |     Frequency |
##  |       Percent |
##  |       Row Pct |
##  |       Col Pct |
##  |---------------|
## 
##  Total Observations:  32 
## 
## ----------------------------------------------------------------------------
## |              |                           gear                            |
## ----------------------------------------------------------------------------
## |          cyl |            3 |            4 |            5 |    Row Total |
## ----------------------------------------------------------------------------
## |            4 |            1 |            8 |            2 |           11 |
## |              |        0.031 |         0.25 |        0.062 |              |
## |              |         0.09 |         0.73 |         0.18 |         0.34 |
## |              |         0.07 |         0.67 |          0.4 |              |
## ----------------------------------------------------------------------------
## |            6 |            2 |            4 |            1 |            7 |
## |              |        0.062 |        0.125 |        0.031 |              |
## |              |         0.29 |         0.57 |         0.14 |         0.22 |
## |              |         0.13 |         0.33 |          0.2 |              |
## ----------------------------------------------------------------------------
## |            8 |           12 |            0 |            2 |           14 |
## |              |        0.375 |            0 |        0.062 |              |
## |              |         0.86 |            0 |         0.14 |         0.44 |
## |              |          0.8 |            0 |          0.4 |              |
## ----------------------------------------------------------------------------
## | Column Total |           15 |           12 |            5 |           32 |
## |              |        0.468 |        0.375 |        0.155 |              |
## ----------------------------------------------------------------------------

If you want the above result as a tibble, use ds_twoway_table().

ds_twoway_table(mtcarz, cyl, gear)
## Joining, by = c("cyl", "gear", "count")
## # A tibble: 8 x 6
##   cyl   gear  count percent row_percent col_percent
##   <fct> <fct> <int>   <dbl>       <dbl>       <dbl>
## 1 4     3         1  0.0312      0.0909      0.0667
## 2 4     4         8  0.25        0.727       0.667 
## 3 4     5         2  0.0625      0.182       0.4   
## 4 6     3         2  0.0625      0.286       0.133 
## 5 6     4         4  0.125       0.571       0.333 
## 6 6     5         1  0.0312      0.143       0.2   
## 7 8     3        12  0.375       0.857       0.8   
## 8 8     5         2  0.0625      0.143       0.4

A plot() method has been defined which will generate:

Grouped Bar Plots

k <- ds_cross_table(mtcarz, cyl, gear)
plot(k)

Stacked Bar Plots

k <- ds_cross_table(mtcarz, cyl, gear)
plot(k, stacked = TRUE)

Proportional Bar Plots

k <- ds_cross_table(mtcarz, cyl, gear)
plot(k, proportional = TRUE)

Frequency Table

The ds_freq_table() function creates frequency tables.

ds_freq_table(mtcarz, cyl)
##                              Variable: cyl                              
## -----------------------------------------------------------------------
## Levels     Frequency    Cum Frequency       Percent        Cum Percent  
## -----------------------------------------------------------------------
##    4          11             11              34.38            34.38    
## -----------------------------------------------------------------------
##    6           7             18              21.88            56.25    
## -----------------------------------------------------------------------
##    8          14             32              43.75             100     
## -----------------------------------------------------------------------
##  Total        32              -             100.00              -      
## -----------------------------------------------------------------------

A plot() method has been defined which will create a bar plot.

k <- ds_freq_table(mtcarz, cyl)
plot(k)

Multiple One Way Tables

The ds_auto_freq_table() function creates multiple one way tables by creating a frequency table for each categorical variable in a data set. You can also specify a subset of variables if you do not want all the variables in the data set to be used.

ds_auto_freq_table(mtcarz)
##                              Variable: cyl                              
## -----------------------------------------------------------------------
## Levels     Frequency    Cum Frequency       Percent        Cum Percent  
## -----------------------------------------------------------------------
##    4          11             11              34.38            34.38    
## -----------------------------------------------------------------------
##    6           7             18              21.88            56.25    
## -----------------------------------------------------------------------
##    8          14             32              43.75             100     
## -----------------------------------------------------------------------
##  Total        32              -             100.00              -      
## -----------------------------------------------------------------------
## 
##                              Variable: vs                               
## -----------------------------------------------------------------------
## Levels     Frequency    Cum Frequency       Percent        Cum Percent  
## -----------------------------------------------------------------------
##    0          18             18              56.25            56.25    
## -----------------------------------------------------------------------
##    1          14             32              43.75             100     
## -----------------------------------------------------------------------
##  Total        32              -             100.00              -      
## -----------------------------------------------------------------------
## 
##                              Variable: am                               
## -----------------------------------------------------------------------
## Levels     Frequency    Cum Frequency       Percent        Cum Percent  
## -----------------------------------------------------------------------
##    0          19             19              59.38            59.38    
## -----------------------------------------------------------------------
##    1          13             32              40.62             100     
## -----------------------------------------------------------------------
##  Total        32              -             100.00              -      
## -----------------------------------------------------------------------
## 
##                             Variable: gear                              
## -----------------------------------------------------------------------
## Levels     Frequency    Cum Frequency       Percent        Cum Percent  
## -----------------------------------------------------------------------
##    3          15             15              46.88            46.88    
## -----------------------------------------------------------------------
##    4          12             27              37.5             84.38    
## -----------------------------------------------------------------------
##    5           5             32              15.62             100     
## -----------------------------------------------------------------------
##  Total        32              -             100.00              -      
## -----------------------------------------------------------------------
## 
##                             Variable: carb                              
## -----------------------------------------------------------------------
## Levels     Frequency    Cum Frequency       Percent        Cum Percent  
## -----------------------------------------------------------------------
##    1           7              7              21.88            21.88    
## -----------------------------------------------------------------------
##    2          10             17              31.25            53.12    
## -----------------------------------------------------------------------
##    3           3             20              9.38             62.5     
## -----------------------------------------------------------------------
##    4          10             30              31.25            93.75    
## -----------------------------------------------------------------------
##    6           1             31              3.12             96.88    
## -----------------------------------------------------------------------
##    8           1             32              3.12              100     
## -----------------------------------------------------------------------
##  Total        32              -             100.00              -      
## -----------------------------------------------------------------------

Multiple Two Way Tables

The ds_auto_cross_table() function creates multiple two way tables by creating a cross table for each unique pair of categorical variables in a data set. You can also specify a subset of variables if you do not want all the variables in the data set to be used.

ds_auto_cross_table(mtcarz, cyl, gear, am)
##     Cell Contents
##  |---------------|
##  |     Frequency |
##  |       Percent |
##  |       Row Pct |
##  |       Col Pct |
##  |---------------|
## 
##  Total Observations:  32 
## 
##                                 cyl vs gear                                 
## ----------------------------------------------------------------------------
## |              |                           gear                            |
## ----------------------------------------------------------------------------
## |          cyl |            3 |            4 |            5 |    Row Total |
## ----------------------------------------------------------------------------
## |            4 |            1 |            8 |            2 |           11 |
## |              |        0.031 |         0.25 |        0.062 |              |
## |              |         0.09 |         0.73 |         0.18 |         0.34 |
## |              |         0.07 |         0.67 |          0.4 |              |
## ----------------------------------------------------------------------------
## |            6 |            2 |            4 |            1 |            7 |
## |              |        0.062 |        0.125 |        0.031 |              |
## |              |         0.29 |         0.57 |         0.14 |         0.22 |
## |              |         0.13 |         0.33 |          0.2 |              |
## ----------------------------------------------------------------------------
## |            8 |           12 |            0 |            2 |           14 |
## |              |        0.375 |            0 |        0.062 |              |
## |              |         0.86 |            0 |         0.14 |         0.44 |
## |              |          0.8 |            0 |          0.4 |              |
## ----------------------------------------------------------------------------
## | Column Total |           15 |           12 |            5 |           32 |
## |              |        0.468 |        0.375 |        0.155 |              |
## ----------------------------------------------------------------------------
## 
## 
##                          cyl vs am                           
## -------------------------------------------------------------
## |              |                     am                     |
## -------------------------------------------------------------
## |          cyl |            0 |            1 |    Row Total |
## -------------------------------------------------------------
## |            4 |            3 |            8 |           11 |
## |              |        0.094 |         0.25 |              |
## |              |         0.27 |         0.73 |         0.34 |
## |              |         0.16 |         0.62 |              |
## -------------------------------------------------------------
## |            6 |            4 |            3 |            7 |
## |              |        0.125 |        0.094 |              |
## |              |         0.57 |         0.43 |         0.22 |
## |              |         0.21 |         0.23 |              |
## -------------------------------------------------------------
## |            8 |           12 |            2 |           14 |
## |              |        0.375 |        0.062 |              |
## |              |         0.86 |         0.14 |         0.44 |
## |              |         0.63 |         0.15 |              |
## -------------------------------------------------------------
## | Column Total |           19 |           13 |           32 |
## |              |        0.594 |        0.406 |              |
## -------------------------------------------------------------
## 
## 
##                          gear vs am                          
## -------------------------------------------------------------
## |              |                     am                     |
## -------------------------------------------------------------
## |         gear |            0 |            1 |    Row Total |
## -------------------------------------------------------------
## |            3 |           15 |            0 |           15 |
## |              |        0.469 |            0 |              |
## |              |            1 |            0 |         0.47 |
## |              |         0.79 |            0 |              |
## -------------------------------------------------------------
## |            4 |            4 |            8 |           12 |
## |              |        0.125 |         0.25 |              |
## |              |         0.33 |         0.67 |         0.38 |
## |              |         0.21 |         0.62 |              |
## -------------------------------------------------------------
## |            5 |            0 |            5 |            5 |
## |              |            0 |        0.156 |              |
## |              |            0 |            1 |         0.16 |
## |              |            0 |         0.38 |              |
## -------------------------------------------------------------
## | Column Total |           19 |           13 |           32 |
## |              |        0.594 |        0.406 |              |
## -------------------------------------------------------------

Visualization

descriptr can help visualize multiple variables by automatically detecting their data types.

Continuous Data

ds_plot_scatter(mtcarz, mpg, disp, hp)

Categorical Data

ds_plot_bar_stacked(mtcarz, cyl, gear, am)

Learning More

The descriptr website includes comprehensive documentation on using the package, including the following articles that cover various aspects of using rfm:

Feedback

All feedback is welcome. Issues (bugs and feature requests) can be posted to github tracker. For help with code or other related questions, feel free to reach me [email protected].

To leave a comment for the author, please follow the link and comment on their blog: Rsquared Academy Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)