Fitting Distributions to Data with R

October 31, 2012
By

(This article was first published on Category: r | Emrah Er, and kindly contributed to R-bloggers)

In “Fitting Distributions with R” Vito Ricci writes;

“Fitting distributions consists in finding a mathematical function which represents in a good way a statistical variable. A statistician often is facing with this problem: he has some observations of a quantitative character and he wishes to test if those observations, being a sample of an unknown population, belong from a population with a pdf (probability density function) , where is a vector of parameters to estimate with available data.

We can identify 4 steps in fitting distributions:

  1. Model/function choice: hypothesize families of distributions;
  2. Estimate parameters;
  3. Evaluate quality of fit;
  4. Goodness of fit statistical tests.”

In SAS this can be done by using proc capability whereas in R we can do the same thing by using fdistrplus and some other packages. In this post I will try to compare the procedures in R and SAS.

Following code chunk creates 10,000 observations from normal distribution with a mean of 10 and standard deviation of 5 and then gives the summary of the data and plots a histogram of it.

If we import the data we created in R into SAS and run the following code;

PROC CAPABILITY;
HISTOGRAM x / NORMAL;
RUN;

SAS gives us the following results;

  1. Moments
  2. Basic Statistical Measures (Location and Variability)
  3. Tests for Location
  4. Observed Quantiles
  5. Extreme Observations
  6. Histogram
  7. Parameter Estimates
  8. Goodness-of-Fit Test Results
  9. Estimated Quantiles

We can obtain same results in R by using e1071, raster, plotrix, stats, fitdistrplus and nortest packages.

1. Moments

N :

Sum Weights : A numeric variable can be specified as a weight variable to weight the values of the analysis variable. The default weight variable is defined to be 1 for each observation. This field is the sum of observation values for the weight variable. In our case, since we didn’t specify a weight variable, SAS uses the default weight variable. Therefore, the sum of weight is the same as the number of observations. (Source)

Mean :

Sum Observations :

Std Deviation :

Skewness :

Kurtosis :

Uncorrected SS : Sum of squared data values. (Source)

Corrected SS : The sum of squared distance of data values from the mean. (Source)

Coeff Variation : The ratio of the standard deviation to the mean. (Source)

Std Error Mean : The estimated standard deviation of the sample mean. (Source)

2. Basic Statistical Measures (Location and Variability)

Range :

Interquartile Range :

3. Tests for Location

Student’s t : Skipped this part

Sign : Skipped this part

Signed Rank :

4. Observed Quantiles

Signed Rank :

5. Extreme Observations : Skipped this part

6. Histogram

6. Parameter Estimates

Mean (Mu) :

Std Dev (Sigma) :

7. Goodness-of-Fit Test Results

Kolmogorov-Smirnov, Cramer-von Mises, and Anderson-Darling

or

Kolmogorov-Smirnov :

Cramer-von Mises :

Anderson-Darling :

Chi-Square :

8. Estimated Quantiles : Skipped this part

We can change the commands to fit other distributions. This is as simple as changing normal to something like beta(theta = SOME NUMBER, scale = SOME NUMBER) or weibull in SAS. Whereas in R one may change the name of the distribution in normal.fit <- fitdist(x,"norm") command to the desired distribution name. While fitting densities you should take the properties of specific distributions into account. For example, Beta distribution is defined between 0 and 1. So you may need to rescale your data in order to fit the Beta distribution.

To leave a comment for the author, please follow the link and comment on his blog: Category: r | Emrah Er.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.