Blog Archives

Example 7.20: Simulate categorical data

January 4, 2010
By
Example 7.20: Simulate categorical data

Both SAS and R provide means of simulating categorical data (see section 1.10.4). Alternatively, it is trivial to write code to do this directly. In this entry, we show how to do it once. In a future entry, we'll demonstrate writing a SAS Macro (section A.8.1) and a function in R (section B.5.2) to do it...

Read more »

Example 7.19: find the closest pair of observations

December 28, 2009
By
Example 7.19: find the closest pair of observations

Suppose we need to find the closest pair of observations on some variable x. For example, we might be concerned that some data had been accidentally duplicated. We return the ID's of the two closest observations, and their distance from each other. In both languages, we'll first create the data, then sort it, recognizing that the...

Read more »

SAS and R included on R bloggers

December 18, 2009
By
SAS and R included on R bloggers

The R bloggers site is an aggregator for blogs about R. We're excited to be joining that community and suggest any readers of this blog may also find it useful.

Read more »

Example 7.18: Displaying missing value categories in a table

December 14, 2009
By
Example 7.18: Displaying missing value categories in a table

When displaying contingency tables (section 2.3.1), there are times when it is useful to either show or hide the missing data category. Both SAS and the typical R command default to displaying the table only for observations where both factors are observed.In this example, we generate some multinomial data (section 1.10.4) and then produce tables with and without...

Read more »

Example 7.15: A more complex sales graphic

October 13, 2009
By
Example 7.15: A more complex sales graphic

The plot of Amazon sales rank over time generated in example 7.14 leaves questions. From a software perspective, we'd like to make the plot prettier, while we can embellish the plot to inform our interpretation about how the rank is calculated.For the latter purpose, we'll create an indicator of whether the rank was recorded in nighttime (eastern US...

Read more »

Example 7.14: A simple graphic of sales

September 29, 2009
By
Example 7.14: A simple graphic of sales

In this example, we show a simple plot of the sales rank data read in as shown in example 7.13.SASIn SAS, we use the symbol statement (section 5.3) to request small (with the h option) dots (with the v option, and that the dots not be connected (with the i option. (See sections 5.2.2, 5.3.9 for more details.)we...

Read more »

Example 7.11: Plot an empirical cumulative distribution function from scratch

August 31, 2009
By
Example 7.11: Plot an empirical cumulative distribution function from scratch

In example 7.8, we used built-in functions to produce an empirical CDF plot. But the empirical cumulative distribution function (CDF) is simple to calculate directly, and it might be useful to have more control over its appearance than is afforded by...

Read more »

Example 7.10: Get data from R into SAS

August 13, 2009
By
Example 7.10: Get data from R into SAS

In our previous entry, we described how to generate a dataset from SAS that could be used for analyses in R. Alternatively, someone primarily using R might want to test the new ”statistical graphics” procedures available starting with SAS 9.2. Her...

Read more »

Example 7.9: Get data from SAS into R

August 8, 2009
By
Example 7.9: Get data from SAS into R

Some people use both SAS and R in their daily work. They might be more familiar with SAS as a tool for manipulating data and R preferable for plotting purposes. While our goal in the book is to enable people to avoid having to switch back and forth, ...

Read more »

Example 7.8: Plot two empirical cumulative density functions using available tools

August 1, 2009
By
Example 7.8: Plot two empirical cumulative density functions using available tools

The empirical cumulative density function (CDF) (section 5.1.16) is a useful way to compare distributions between populations. The Kolmogorov-Smirnov (section 2.4.2) statistic D is the value of x with the maximum distance between the two curves. As an...

Read more »