Example 8.32: The HistData package, sunflower plots, and getting data from R into SAS

[This article was first published on SAS and R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This entry is mainly a promotion of the fascinating HistData R package. The package, compiled by the psychologist, statistician, and graphics innovator Michael Friendly, contains a number of small data sets of historical interest. These include data from John Snow‘s map of cholera in London, Minard’s map of Napoleon’s Russian campaign of 1812, Galton’s data on heights of parents and children, and many others.

If you have any interest in Minard’s map, Friendly also hosts a site about the map, Minard, and a gallery with some re-imaginings of the map data, at http://datavis.ca/gallery/re-minard.php. The gallery includes R and SAS versions, as well as one which uses Google Maps.

Once you install the package and library() it (section B.6.1), you can gain access to the data with the data() function. For example, we show Galton’s data, which lead to the description of regression to the mean.
> data(Galton)
> head(Galton)
  parent child
1   70.5  61.7
2   68.5  61.7
3   65.5  61.7
4   64.5  61.7
5   64.0  61.7
6   67.5  62.2

The package also includes example() methods for many of the data sets: example(Galton) results in the sunflower plot shown above. The sunflower plot (section 5.1.14) is an alternative to jittering when many observations share values. If the data start as more continuous, you might see the sunflower plot as a form of two-dimensional histogram. You can get a list of data sets available with ?'HistData-package'

We’re not aware of a companion set of SAS data sets. An easy way to access the data sets in SAS is to load the package into R and export the data into SAS using the foreign package (section 1.2.2).
> library(foreign)
> write.foreign(Galton,"galton.dat","galton.sas",package="SAS")

Running the galton.sas file written by the write.foreign function makes a SAS data set called rdata with varibles parent and child. We can make a sunflower plot in SAS using a macro written, coincidentally, by Michael Friendly, which he hosts here. Making a plot requires running the “sunfont.sas” file and the “sunplot.sas” file. I had to modify the “sunfont.sas” file slightly, and I give the edited file here:
libname gfont0 'c:\temp';
data sunsymb;                                                                   
  alpha = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ';                                         
  do n=1 to 26;                                                                 
     x = .2; y = .2; output;         /* Draw small box at center */             
     x =-.2; y = .2; output;         /* of each symbol           */             
     x =-.2; y =-.2; output;                                                    
     x = .2; y =-.2; output;                                                    
     x = .2; y = .2; output;                                                    
     if n>1 then                                                                
       do i=1 to n;                  /* draw n radial lines      */             
         x=0; y=0; output;                                                      
         x=cos(2*atan(1) + i/n*(8*atan(1)));                                    
         y=sin(2*atan(1) + i/n*(8*atan(1)));                                    
proc gfont data=sunsymb             /* name=GB0426 */                
           name=sun showroman h=3 romht=2 resol=2; 

In this step, Friendly is constructing a font whose “letters” are the sunflower symbols with various numbers of petals. Note that if you already define a gfont0 library, the first line above is not needed.

Then the sunplot macro can be read in and run.
%include "c:\ken\sasmacros\sunplot.sas";
%sunplot(data=rdata, x=parent, y = child); run;

The resulting plot is shown below. The SAS version is rather more primitive, (and I did not bother to add the ellipses or regression line) but both the SAS and R versions show that children tend to be less unusual than their parents, and the more unusual the parent is, the more the child shrinks toward the mean.

To leave a comment for the author, please follow the link and comment on their blog: SAS and R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)