**Revolutions**, and kindly contributed to R-bloggers)

*by Joseph Rickert*

If

you type ?Distributions at the R console you get a list of the 21 probability

distributions included in the stats package that ships with base R. The same

list appears in the Introduction to R Manual on CRAN and in most of the many fine introductory books available for the R language. These are indeed fundamental distributions, sufficient for

most elementary work in probability and statistics. The fact that the R functions

implementing these distributions all follow same syntax greatly

eases a beginner's task of trying to get some useful work done with a minimum

of memorization.

The following figure shows plots of the cumulative distribution

pgamma()and probability

density function dgamma() along

with the histogram of random draws from a gamma distribution rgamma(2,2)with shape and scale parameters both set to 2.

However, if a person isn’t familiar with how information

about R is organized on CRAN, he or she might conclude: “that’s it” or most of it anyway, with respect

to R and probability distributions. Imagine the surprise then of a person with

such modest expectations about R’s probability distributions accidently

stumbling into the overgrown garden of R’s Probability Distributions Task View. I think my first reaction was kind of glazed over inability

to take it all in.

However, if you just let your eyes relax and pick out a

flower with which you are familiar, binomial for example, you can see that the

chief gardener Christophe Dutang, listed as the maintainer of the Task View, and the eight individuals

whom acknowledges have done a remarkable job of organizing the distributions

according to their genus (discrete or continuous), species (binomial in this

case) and variety (truncated binomial and zero inflated binomial). I can’t

imagine the number of volunteer hours took to assemble this page, and keeping

it up to date can’t be easy either. I

spent a half hour or so just trying to count the distributions. Not counting

copulas, random matrices and other exotica I came up with 31 discrete, 133

continuous and 9 mixture distributions. Others may count more or less depending

on how they group things together. It seems as if few people outside of the

folks at Wikipedia have given much thought to the taxonomy of probability

distributions and only Mathematica 9 which includes 130 probability distributions comes close to cultivating so many distributions in one

coherent system. (To be fair, the online documentation for SAS, Matlab and SPSS is so distributed that it is difficult to determine how many probability distrbutions have ben implemented in these software packages.)

While the Probability Distributions Task view may be the

place to start for information about probability distributions, the complete R documentation

is itself an open ended, organic system that depends on the communication style

of package authors and the experiences of everyone who leaves a record of their

attempts to work with probability distributions.

The entire ecosystem of R documentation

for a probability distribution function starts with the command line help (

e.g. ?pgamma) and the package pdf on CRAN that includes the function, but may also include, vignettes,

external web pages, blog posts and questions and discussions on help bulletin

boards such as the R mailing lists and StackOverflow. For

some typical examples, consider that the actuar package from Vincet Goulet *et al.* which provides a number of distributions of interest to acturies has six vignettes, while Thomas Yee's VGAM package for Vector Generalized Linear and Additive Models, a source for many R probability distributions, has a web page as well as a vignette.

John

D. Cook’s clickable diagram for elementary probability distributions is hosted on his private website while and the paper by Delignette-Muller *et al*. on fitting distributions with R’s

fitdistrplus package is hosted on an academic website. Mage's post from December 2011 on fitting distributions in R is an example of the many blog posts that deserve a second look.

As a final example of how the community comes to play a part

of the extended documentation for R, consider my attempt get a handle on the

Cauchy distribution. Here I ran the below and got four very

different looking plots. This is not unexpected given that I’m working with

random draws from a probability distribution for which both the mean and

variance are not defined. But why only two bins for the histograms?

Well, I wasn’t the first person to pause for a moment over this. Someone recently asked

this question on StackOverflow and received some good advice.

Hats

off and thank you to everyone involved in cultivating R’s garden of probability

distributions

# Cauchy plots n <- 10000 location <- -1 scale <- 4 par(mfrow=c(2,2)) # Make four plots for(i in 1:4){ y <- rcauchy(n, location, scale) hist(y, freq = FALSE, col = rainbow(6), main="random draw from rcauchy(-1,4)") fd <- function(y)dcauchy(y,shape,scale) curve(fd, col = "black", add = TRUE,lwd=2) rug(y,col="grey") }

**leave a comment**for the author, please follow the link and comment on his blog:

**Revolutions**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...