Example 8.32: The HistData package, sunflower plots, and getting data from R into SAS

March 29, 2011

(This article was first published on SAS and R, and kindly contributed to R-bloggers)

This entry is mainly a promotion of the fascinating HistData R package. The package, compiled by the psychologist, statistician, and graphics innovator Michael Friendly, contains a number of small data sets of historical interest. These include data from John Snow‘s map of cholera in London, Minard’s map of Napoleon’s Russian campaign of 1812, Galton’s data on heights of parents and children, and many others.

If you have any interest in Minard’s map, Friendly also hosts a site about the map, Minard, and a gallery with some re-imaginings of the map data, at http://datavis.ca/gallery/re-minard.php. The gallery includes R and SAS versions, as well as one which uses Google Maps.

Once you install the package and library() it (section B.6.1), you can gain access to the data with the data() function. For example, we show Galton’s data, which lead to the description of regression to the mean.

> data(Galton)
> head(Galton)
parent child
1 70.5 61.7
2 68.5 61.7
3 65.5 61.7
4 64.5 61.7
5 64.0 61.7
6 67.5 62.2

The package also includes example() methods for many of the data sets: example(Galton) results in the sunflower plot shown above. The sunflower plot (section 5.1.14) is an alternative to jittering when many observations share values. If the data start as more continuous, you might see the sunflower plot as a form of two-dimensional histogram. You can get a list of data sets available with ?'HistData-package'

We’re not aware of a companion set of SAS data sets. An easy way to access the data sets in SAS is to load the package into R and export the data into SAS using the foreign package (section 1.2.2).

> library(foreign)
> write.foreign(Galton,"galton.dat","galton.sas",package="SAS")

Running the galton.sas file written by the write.foreign function makes a SAS data set called rdata with varibles parent and child. We can make a sunflower plot in SAS using a macro written, coincidentally, by Michael Friendly, which he hosts here. Making a plot requires running the “sunfont.sas” file and the “sunplot.sas” file. I had to modify the “sunfont.sas” file slightly, and I give the edited file here:

libname gfont0 'c:\temp';

data sunsymb;
do n=1 to 26;
x = .2; y = .2; output; /* Draw small box at center */
x =-.2; y = .2; output; /* of each symbol */
x =-.2; y =-.2; output;
x = .2; y =-.2; output;
x = .2; y = .2; output;
if n>1 then
do i=1 to n; /* draw n radial lines */
x=0; y=0; output;
x=cos(2*atan(1) + i/n*(8*atan(1)));
y=sin(2*atan(1) + i/n*(8*atan(1)));

proc gfont data=sunsymb /* name=GB0426 */
name=sun showroman h=3 romht=2 resol=2;

In this step, Friendly is constructing a font whose “letters” are the sunflower symbols with various numbers of petals. Note that if you already define a gfont0 library, the first line above is not needed.

Then the sunplot macro can be read in and run.

%include "c:\ken\sasmacros\sunplot.sas";
%sunplot(data=rdata, x=parent, y = child); run;

The resulting plot is shown below. The SAS version is rather more primitive, (and I did not bother to add the ellipses or regression line) but both the SAS and R versions show that children tend to be less unusual than their parents, and the more unusual the parent is, the more the child shrinks toward the mean.

To leave a comment for the author, please follow the link and comment on their blog: SAS and R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , , , , , ,

Comments are closed.


Mango solutions

RStudio homepage

Zero Inflated Models and Generalized Linear Mixed Models with R

Quantide: statistical consulting and training



CRC R books series

Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)