# Example 9.9: Simplifying R using the mosaic package (part 1)

[This article was first published on

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

**SAS and R**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

While both SAS and R are powerful systems for statistical analysis, they can be frustrating to new users or those learning statistics for the first time.

**R**

The mosaic package is designed to help simplify the interface for such new users, while allowing them to undertake sophisticated analyses.

As an example of how the package simplifies life for the novice user, consider calculating summary statistics and displaying a densityplot for the CESD (measure of depressive symptom) scores by substance abuse group in the HELP dataset. Doing this in R without the package would require mastering a package such as plyr to replicate results by substance or a typing-intensive use of syntax to select rows corresponding to each substance.

ds = read.csv("http://www.math.smith.edu/r/data/help.csv") library(mosaic) options(digits=3)

After loading the data and the package, and setting the number of digits to a more reasonable default, we can call the

`mean()`function to easily calculate this statistic (denoted by S in the result) for each of the three substance abuse groups alcohol, cocaine or heroin.

> mean(cesd ~ substance, data=ds) substance S N Missing 1 alcohol 34.4 177 0 2 cocaine 29.4 152 0 3 heroin 34.9 124 0

Similar results are seen when we calculate the standard deviations per group:

> sd(cesd ~ substance, data=ds) substance S N Missing 1 alcohol 12.1 177 0 2 cocaine 13.4 152 0 3 heroin 11.2 124 0

Another function can calculate a raft of summary statistics for each group that are nicely formatted.

> summary(cesd ~ substance, data=ds, fun=favstats) cesd N=453 +---------+-------+---+----+---+-------+---+----+-----+----+---+--------+ | | |N |min |Q1 |median |Q3 |max |mean |sd |n |missing | +---------+-------+---+----+---+-------+---+----+-----+----+---+--------+ |substance|alcohol|177|4 |26 |36 |42 |58 |34.4 |12.1|177|0 | | |cocaine|152|1 |19 |30 |39 |60 |29.4 |13.4|152|0 | | |heroin |124|4 |28 |35 |43 |56 |34.9 |11.2|124|0 | +---------+-------+---+----+---+-------+---+----+-----+----+---+--------+ |Overall | |453|1 |25 |34 |41 |60 |32.8 |12.5|453|0 | +---------+-------+---+----+---+-------+---+----+-----+----+---+--------+

These commands allow quick review of the data to ensure, for example, that assumptions of equal variance are justified, or that coding errors or missing values haven’t crept in.

A graphical depiction using a set of densityplots (shown above) can be created using the command:

densityplot(~ cesd, group=substance, data=ds, auto.key=TRUE)

**SAS**

We’re unaware of any similar program that attempts to simplify SAS syntax for educational use. To replicate the above results, we would use the

`means`and

`sgpanel`procedures.

data ds; set "C:\book\help.sas7bdat"; run; options ls=80; proc means data=ds fw=4 min q1 median q3 max mean std nmiss n; class substance; var cesd; run; Analysis Variable : CESD N Lower Upper Std SUBSTANCE Obs Min Quartile Median Quartile Max Mean Dev ------------------------------------------------------------------ alcohol 177 4.00 26.0 36.0 42.0 58.0 34.4 12.1 cocaine 152 1.00 19.0 30.0 39.0 60.0 29.4 13.4 heroin 124 4.00 28.0 35.0 43.0 56.0 34.9 11.2 ------------------------------------------------------------------ N N SUBSTANCE Obs Miss N --------------------------- alcohol 177 0 177 cocaine 152 0 152 heroin 124 0 124 ---------------------------

After reading the data in, the

`means`procedure can produce any of the desired statistics (plus may others) directly. To replicate the

`mosaic`package in printing a single statistic, list only that statistic in the

`proc means`statement. Note that the overall statistic in the R table is not included. To replicate that row, you would re-run the above code, omitting the

`class`statement.

To the best of our knowledge, there still does not exist an easy way to plot multiple densities in a single SAS plot. In example 2.6.4 we show how it can be done using

`proc kde`, saving the density estimates, and plotting separately. (Code for this is included at the book web site.) But in the interest of simple code, we show a simpler method using

`proc sgpanel`. The result, show below, is less useful than the R plot from the the

`mosaic`package, but still gets the point across.

proc sgpanel data=ds; panelby substance / columns=1; density cesd / type=kernel; run;

To

**leave a comment**for the author, please follow the link and comment on their blog:**SAS and R**.R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.