Read sas7bdat files in R with GGASoftware Parso library

September 12, 2014
By

(This article was first published on BioStatMatt » R, and kindly contributed to R-bloggers)

… using the new R package sas7bdat.parso.

The software company GGASoftware has extended the work of myself and others on the sas7bdat R package by developing a Java library called Parso, which also reads sas7bdat files. They have worked out most of the remaining kinks. For example, the Parso library reads sas7bdat files with compressed data (i.e., written with COMPRESS=yes or COMPRESS=binary). I hope to eventually bring the project full circle, and incorporate their improvements into the sas7bdat file format documentation and code in the sas7bdat package.

The Parso library is made available under terms of the GPLv3, and is also available under a commercial license. So, last weekend, with the help of Tobias Verbeke’s helloJavaWorld R package template, I implemented an R package that wraps the functionality of the Parso library. The new package, sas7bdat.parso (currently hosted exclusively on GitHub), depends on the R package rJava, and implements the functions s7b2csv and read.sas7bdat.parso. The former function is the workhorse, which reads a sas7bdat file and writes a corresponding CSV file. All of the file input/output happens in the Java implementation (for speed and simplicity). The latter function read.sas7bdat.parso simply converts a sas7bdat file to temporary (i.e., using tempfile) CSV file, and then reads the CSV file using read.csv. There may still be some kinks the the Parso library, or in the wrapper R package, but I hope that this additional resource will help finally eliminate the SAS data file barrier that many of us have experienced for years.

Installation of the R package rJava is more complicated than simply calling install.packages("rJava"). In order for the rJava package to work, and hence the sas7bdat.parso package, a JDK (Java Development Kit) must be installed. You can download the Oracle JDK from the Oracle website. Once the JDK is installed, the easiest way to install the sas7bdat.parso library is using the install_github function in the devtools package (e.g., library("devtools"); install_github("biostatmatt/sas7bdat.parso")).

To leave a comment for the author, please follow the link and comment on their blog: BioStatMatt » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)