# Really useful R package: sas7bdat

July 25, 2011
By

(This article was first published on SAS and R, and kindly contributed to R-bloggers)

For SAS users, one hassle in trying things in R, let alone migrating, is the difficulty of getting data out of SAS and into R. In our book (section 1.2.2) and in a blog entry we've covered getting data out of SAS native data sets. Unfortunately, for all of these methods, you need a working, licensed version of SAS.

However Matt Shotwell has reverse-engineered the sas7bdat file format. This means that you can now read a SAS data set without a working copy of SAS. This is a wonderful thing, and in fact SAS Institute ought to have provided this ability long ago. The package is experimental, but it worked fine for two small data sets. Matt tells me that as of 7/2011, the package only works for sas7bdat files generated on 32-bit Windows systems.

R
Install the package sas7bdat. The use the read.sas7bdat() function.
library(sas7bdat)helpfromSAS = read.sas7bdat("http://www.math.smith.edu/sasr/datasets/help.sas7bdat")

(Note that newlines are not allowed in the URL in practice, but formatting for the blog required it.)
> is.data.frame(helpfromSAS)[1] TRUE> summary(helpfromSAS\$MCS)   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.   6.763  21.680  28.600  31.680  40.940  62.180 > with(helpfromSAS, summary(SUBSTANCE))alcohol cocaine  heroin     177     152     124

It's unclear why all the variable names are all capitalized. That didn't happen in another trial, so it must be something about the way the help.sas7bdat data set was constructed.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...