**Category: r | Emrah Er**, and kindly contributed to R-bloggers)

In “Fitting Distributions with R” Vito Ricci writes;

“Fitting distributions consists in finding a mathematical function which represents in a good way a statistical

variable. A statistician often is facing with this problem: he has some observations of a quantitative character

and he wishes to test if those observations, being a sample of an unknown population, belong

from a population with a pdf (probability density function) , where is a vector of parameters to

estimate with available data.

We can identify 4 steps in fitting distributions:

- Model/function choice: hypothesize families of distributions;
- Estimate parameters;
- Evaluate quality of fit;
- Goodness of fit statistical tests.”

In SAS this can be done by using `proc capability`

whereas in R we can do the same thing by using `fdistrplus`

and some other packages. In this post I will try to compare the procedures in R and SAS.

Following code chunk creates 10,000 observations from normal distribution with a mean of 10 and standard deviation of 5 and then gives the summary of the data and plots a histogram of it.

If we import the data we created in R into SAS and run the following code;

```
PROC CAPABILITY;
HISTOGRAM x / NORMAL;
RUN;
```

SAS gives us the following results;

- Moments
- Basic Statistical Measures (Location and Variability)
- Tests for Location
- Observed Quantiles
- Extreme Observations
- Histogram
- Parameter Estimates
- Goodness-of-Fit Test Results
- Estimated Quantiles

We can obtain same results in R by using `e1071`

, `raster`

, `plotrix`

, `stats`

, `fitdistrplus`

and `nortest`

packages.

**1. Moments**

** N** :

** Sum Weights** : A numeric variable can be specified as a weight variable to weight the values of the analysis variable. The default weight variable is defined to be 1 for each observation. This field is the sum of observation values for the weight variable. In our case, since we didn’t specify a weight variable, SAS uses the default weight variable. Therefore, the sum of weight is the same as the number of observations. (Source)

** Mean** :

** Sum Observations** :

** Std Deviation** :

** Skewness** :

** Kurtosis** :

** Uncorrected SS** : Sum of squared data values. (Source)

** Corrected SS** : The sum of squared distance of data values from the mean. (Source)

** Coeff Variation** : The ratio of the standard deviation to the mean. (Source)

** Std Error Mean** : The estimated standard deviation of the sample mean. (Source)

**2. Basic Statistical Measures (Location and Variability)**

** Range** :

** Interquartile Range** :

**3. Tests for Location**

** Student’s t** :

*Skipped this part*** Sign** :

*Skipped this part*** Signed Rank** :

**4. Observed Quantiles**

** Signed Rank** :

**5. Extreme Observations** : *Skipped this part*

**6. Histogram**

**6. Parameter Estimates**

** Mean (Mu)** :

** Std Dev (Sigma)** :

**7. Goodness-of-Fit Test Results**

*Kolmogorov-Smirnov, Cramer-von Mises, and Anderson-Darling*

or

** Kolmogorov-Smirnov** :

** Cramer-von Mises** :

** Anderson-Darling** :

** Chi-Square** :

**8. Estimated Quantiles** : *Skipped this part*

We can change the commands to fit other distributions. This is as simple as changing `normal`

to something like `beta(theta = SOME NUMBER, scale = SOME NUMBER)`

or `weibull`

in SAS. Whereas in R one may change the name of the distribution in `normal.fit <- fitdist(x,"norm")`

command to the desired distribution name. While fitting densities you should take the properties of specific distributions into account. For example, Beta distribution is defined between 0 and 1. So you may need to rescale your data in order to fit the Beta distribution.

**leave a comment**for the author, please follow the link and comment on their blog:

**Category: r | Emrah Er**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...