Example 7.8: Plot two empirical cumulative density functions using available tools

August 1, 2009

(This article was first published on SAS and R, and kindly contributed to R-bloggers)

The empirical cumulative density function (CDF) (section 5.1.16) is a useful way to compare distributions between populations. The Kolmogorov-Smirnov (section 2.4.2) statistic D is the value of x with the maximum distance between the two curves. As an example, we compare the male and female distributions of pcs from the HELP data set described in the book. Here, we use built-in tools to plot the graph; in later entries we will build it from scratch for greater control.

We begin by reading in the data (section 1.1.14) as a comma separated file from the book web site (section 1.1.6).


filename myurl
url 'http://www.math.smith.edu/sasr/datasets/help.csv'

proc import
datafile=myurl out=ds dbms=dlm;

SAS proc univariate can do this plot automatically (section 5.1.15). It is designed to compare two groups within the data set, using the class statement (section 3.1.3).

proc univariate data=ds;
var pcs;
class female;
cdfplot pcs / overlay;

In R, the plot() function accepts ecdf() objects (section 5.1.15) as input. Applying this to pcs, conditional on including only the rows when female is 1 (section B.4.2) creates the first empirical CDF as well as the axes. The lines() function (section 5.2.1) also accepts ecdf() objects as input, and applying this to pcs when female is 0 adds the second empirical CDF to the existing plot. A legend (section 5.2.14) is added to show which curve is which. (Note that the Blogger software prevents displaying this image large enough to see the difference here, but it will be visible when run locally.


> ds <- read.csv(
> attach(ds)
> plot(ecdf(pcs[female==1]), verticals=TRUE, pch=46)
> lines(ecdf(pcs[female==0]), verticals=TRUE, pch=46)
> legend(20, 0.8, legend=c("Women", "Men"), lwd=1:3)

Click the graphic below for a more legible image of the output.

To leave a comment for the author, please follow the link and comment on their blog: SAS and R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , , , ,

Comments are closed.


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)