DescTools: a new R "misc package"

September 25, 2014
By

(This article was first published on Revolutions, and kindly contributed to R-bloggers)

by Joseph Rickert

One of the most difficult things about R, a problem that is particularly vexing to beginners, is finding things. This is an unintended consequence of R's spectacular, but mostly uncoordinated, organic growth. The R core team does a superb job of maintaining the stability and growth of the R language itself, but the innovation engine for new functionality is largely in the hands of the global R communty. 

Several structures have been put in place to address various apsects of the finding things problem. For example, Task Views represent a monumental effort to collect and classify R packages. The RSeek site is an effective tool for web searches. RBloggers is a good place to go for R applications and CRANberries let's you know what's new. But, how do you find things that you didn't even know you were looking for?For this, the so called "misc packages" can be very helpful. Whereas the majority of R packages are focused on a particular type of analysis or class of models, or special tool, misc packages tend to be collections of functions that facilitate common tasks. (Look below for a partial list).

DescTools is a new entry to the misc package scene that I think could become very popular. The description for the package begins:

DescTools contains a bunch of basic statistic functions and convenience wrappers for efficiently describing data, creating specific plots, doing reports using MS Word, Excel or PowerPoint. The package's intention is to offer a toolbox, which facilitates the (notoriously time consuming) first descriptive tasks in data analysis, consisting of calculating descriptive statistics, drawing graphical summaries and reporting the results. Many of the included functions can be found scattered in other packages and other sources written partly by Titans of R. 

So far, of the 380 functions in this collection the Desc function has my attention. This function provides very nice tabular and graphic summaries of the variables in a data frame with output that is specific to the data type.  The d.pizza data frame that comes with the package has a nice mix of data types

head(d.pizza)
  index       date week weekday        area count rabate  price operator  driver delivery_min temperature wine_ordered wine_delivered
1     1 2014-03-01    9       6      Camden     5   TRUE 65.655   Rhonda  Taylor         20.0        53.0            0              0
2     2 2014-03-01    9       6 Westminster     2  FALSE 26.980   Rhonda Butcher         19.6        56.4            0              0
3     3 2014-03-01    9       6 Westminster     3  FALSE 40.970  Allanah Butcher         17.8        36.5            0              0
4     4 2014-03-01    9       6       Brent     2  FALSE 25.980  Allanah  Taylor         37.3          NA            0              0
5     5 2014-03-01    9       6       Brent     5   TRUE 57.555   Rhonda  Carter         21.8        50.0            0              0
6     6 2014-03-01    9       6      Camden     1  FALSE 13.990  Allanah  Taylor         48.7        27.0            0              0
  wrongpizza quality
1      FALSE  medium
2      FALSE    high
3      FALSE    <NA>
4      FALSE    <NA>
5      FALSE  medium
6      FALSE     low

Here is some of the voluminous output from the function. The data frame as a whole is summarized as follows

'data.frame':	1209 obs. of  16 variables:
  1 $ index         : int  1 2 3 4 5 6 7 8 9 10 ...
  2 $ date          : Date, format: "2014-03-01" "2014-03-01" "2014-03-01" "2014-03-01" ...
  3 $ week          : num  9 9 9 9 9 9 9 9 9 9 ...
  4 $ weekday       : num  6 6 6 6 6 6 6 6 6 6 ...
  5 $ area          : Factor w/ 3 levels "Brent","Camden",..: 2 3 3 1 1 2 2 1 3 1 ...
  6 $ count         : int  5 2 3 2 5 1 4 NA 3 6 ...
  7 $ rabate        : logi  TRUE FALSE FALSE FALSE TRUE FALSE ...
  8 $ price         : num  65.7 27 41 26 57.6 ...
  9 $ operator      : Factor w/ 3 levels "Allanah","Maria",..: 3 3 1 1 3 1 3 1 1 3 ...
 10 $ driver        : Factor w/ 7 levels "Butcher","Carpenter",..: 7 1 1 7 3 7 7 7 7 3 ...
 11 $ delivery_min  : num  20 19.6 17.8 37.3 21.8 48.7 49.3 25.6 26.4 24.3 ...
 12 $ temperature   : num  53 56.4 36.5 NA 50 27 33.9 54.8 48 54.4 ...
 13 $ wine_ordered  : int  0 0 0 0 0 0 1 NA 0 1 ...
 14 $ wine_delivered: int  0 0 0 0 0 0 1 NA 0 1 ...
 15 $ wrongpizza    : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
 16 $ quality       : Ord.factor w/ 3 levels "low"<"medium"<..: 2 3 NA NA 2 1 1 3 3 2 ...

 The factor variable driver gets a table and a plot.

10 - driver (factor)
 
  length      n    NAs levels unique  dupes
   1'209  1'204      5      7      7      y
 
 
      level freq  perc cumfreq cumperc
1 Carpenter  272  .226     272    .226
2    Carter  234  .194     506    .420
3    Taylor  204  .169     710    .590
4    Hunter  156  .130     866    .719
5    Miller  125  .104     991    .823
6    Farmer  117  .097    1108    .920
7   Butcher   96  .080    1204   1.000

Driver_plot

and so does the numeric variable delivery.

11 - delivery_min (numeric)
 
  length      n    NAs unique     0s   mean meanSE
   1'209  1'209      0    384      0 25.653  0.312
 
     .05    .10    .25 median    .75    .90    .95
  10.400 11.600 17.400 24.400 32.500 40.420 45.200
 
     rng     sd  vcoef    mad    IQR   skew   kurt
  56.800 10.843  0.423 11.268 15.100  0.611  0.095
 
lowest : 8.8 (3), 8.9, 9 (3), 9.1 (5), 9.2 (3)
highest: 61.9, 62.7, 62.9, 63.2, 65.6
 
Shapiro-Wilks normality test  p.value : 2.2725e-16 

Delivery_plot

Pretty nice for an automatic first look at the data.

For some more R treasure hunting have a look into the following short list of misc packages.

Package

Description

plyr

Tools for manipulating data (No 1 package downloaded for 2013)

stringr

Convenience wrappers for functions for manipulating strings

Hmisc

One of the most popular R packages of all time: functions for data analysis, graphics, utilities and much more

devtools

Package development tools

caret

The “go to” package for machine learning, classification and regression training

e1071

Good svm implementation and other machine learning algorithms

DesTools

Tools for describing data and descriptive statistics

partykit

Tools for plotting decision trees

pracma

Functions for numerical analysis, linear algebra, optimization, differential equations and some special functions

IDPmisc

Contains different high-level graphics functions for displaying large datasets

survMisc

Relatively new package with various functions for survival data extending the methods available in the survival package.

miscet

New this year: miscellaneous R tools to simplify the working with data types and formats including functions for working with data frames and character strings

miscFuncs

Some functions for Kalman filters

misc3d

Misc 3d plots including isosurfaces

mapmisc

New package with utilities for producing maps

gtools

Various programming tools like ASCIIfy() to convert characters to ASCII and checkRVersion() to see if a newer version of R is available

NCmisc

A grab bag of utilities including progress bars and function timers

To leave a comment for the author, please follow the link and comment on their blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)