How to Load SAS Files in R: Transitioning from SAS to R with Seamless Data Integration

[This article was first published on Tag: r - Appsilon | Enterprise R Shiny Dashboards, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Using R as an alternative to SAS (Statistical Analysis System) offers bespoke interactivity on top of R routines. It enables effective technical handling while engaging non-technical users through interactive data storytelling.

Transitioning from SAS to R can be a challenge for many data analysts and programmers. But the solution is within reach. It can be easy if part of your SAS pipeline produces data that you use to create reports from R. 

In this article, we will explore how to integrate SAS data into your R workflow, allowing you to harness the strengths of both tools. We will focus on reading and writing SAS data files in R and overcoming common challenges. By the end of this guide, you’ll be well-equipped to bridge the gap between SAS and R, making your data analysis journey smooth and efficient. 

TL;DR:

  • Transitioning from SAS to R offers enhanced technical functionality, making data interaction more intuitive. 
  • Explore how to smoothly transition and integrate data from SAS and R.
  • Understand SAS File Types: 
    • Data Files (.sas7bdat) –  hold tabular data similar to R dataframes
    • Catalog Files (.sas7bcat) – contain dataset metadata
  • Read SAS Data in R and Write SAS Data from R using the haven package with practical examples.
  • Best Practices:
    • Prioritize reproducibility
    • Use targets pipeline for routine tasks
    • Seek guidance (from R/SAS communities and platforms like Stack Overflow)
  • The haven package simplifies SAS and R data interoperability.

Table of Contents

Understanding Different SAS File Types

SAS has many types of file objects. We will explore how to use R to both read and write the following types of SAS objects:

    1. Data Files (.sas7bdat): These files store tabular data, including numeric, character, and date variables. SAS data files are the most common type and similar to R data frames.
    2. Catalog Files (.sas7bcat): Catalog files contain metadata about datasets, including variable formats, labels, and other attributes.

How-To: Reading SAS data

To read SAS files in R, we can use the {haven} package, created and maintained by the tidyverse ecosystem. It provides functions to read SAS datasets. 

Here’s a step-by-step guide to reading SAS files in R:

#install.packages(“haven”)
library(haven)

sas_data <- read_sas("file.sas7bdat")

You can use this approach both for .sas7bdat and .sas7bcat extension files. 

Encoding Issues

SAS datasets might not use standard encodings. To handle these issues, specify the encoding when using read_sas():

read_sas("file.sas7bdat", encoding = "UTF-8")

Dealing with SAS Labels

In R, we handle value labels by using factors. However, SAS does it in a different way (semantics from SAS). Haven provides the labeled S3 classes to allow importing labeled vectors into R.

From the documentation vignette, it showcases an example on how it can deal with labelled SAS object files.

x1 <- labelled(
  sample(1:5), 
  c(Good = 1, Bad = 5)
)

x2 <- labelled(
  c("M", "F", "F", "F", "M"), 
  c(Male = "M", Female = "F")
)

tibble::data_frame(x1, x2, z = 1:5)

Labeled Vectors in R Code Snippet

Tibble Data Frame in R

How-To: Writing SAS data from R

To write SAS data from R, you can also use the haven package:

my_data <- data.frame(
  ID = 1:5,
  Name = c("Bob", "Ed", "Rod", "Dav", "Eva"),
  Value = c(90, 85, 78, 92, 88)
)


write_xpt(
  my_data,
  path = "output_file.sas7bdat"
)

Missing values

Newer version of haven already deals with missing values in the same format as “NA” from R.

You can also specify a missing value manually if required by using tagged_na().

my_data <- data.frame(
  ID = 1:5,
  Name = c("Bob", "Ed", "Rod", "Dav", "Eva"),
  Value = c(90, 85, 78, 92, 88),
  na_values = tagged_na("Not applicable")
)

write_xpt(
  my_data,
  path = “output_file.sas7bdat"
)


read_sas("output_file.sas7bdat”)
A code snippet showcasing a tibble (a specialized data frame in R) with four columns: ID, Name, Value, and na_values. The tibble lists 5 rows of data, where each row corresponds to a unique individual with an associated name, a numerical value, and an NA value in the na_values column.

Tibble Data Display with Individual Records in R

Example

Let’s dive in a simple example using SAS datasets. For this scenario, we’ll download the CARS dataset from SASHELP library.

A split-screen interface from SAS Studio, displaying server files and folders on the left and an output data table on the right.

SAS Studio

Set library to sas data file

Since SASHELP is a library dataset, it’s not in a SAS data file (sas7bdat). This means that we must save it to the proper format before downloading the data file. To do this, just run the SAS program:

%Let username = your_username;
Libname out "/home/&username/sasuser.v94/";
Data out.cars_data;
	set sashelp.cars;
run;

Remember to update your folder username variable without parentheses.

Now you can download the data file and use it in R.

SAS Studio

Playing with data – Summary statistics

To illustrate the example in R, let’s calculate a summary statistic of all columns by the column Type. In SAS, this can be done with the utility helper “Summary statistics”.

Statistical Results for the “SASHELPCARS” dataset

You can print the result as a pdf file

It also returns the code:

ods noproctitle;
ods graphics / imagemap=on;

proc means data=SASHELP.CARS chartype mean std min max median n nmiss vardef=df
      	qrange qmethod=os;
    	var MSRP Invoice EngineSize Cylinders Horsepower MPG_City MPG_Highway Weight
      	Wheelbase Length;
    	class Type;
run;

In R, a similar approach that results in a pdf file can be done with the package summarytools.

library(haven)
library(dplyr)
library(summarytools)

data <- read_sas("cars_data.sas7bdat")
grouped_data <- data %>%
  group_by(Type)

view(dfSummary(grouped_data))

A split interface on SAS Studio displaying summaries of two data frames.

Comparative summary of two datasets: ‘Hybrid’ cars on the left and ‘SUV’ cars on the right.

If you desire to have the result in a dataframe, just update the code:

summarised_data <- data %>% 
  group_by(Type) %>% 
  summarise(
    across(
      where(is.numeric),
      list(
        mean = mean,
        stdev = sd,
        median = median,
        min = min,
        max = max,
        iqr = ~IQR(..1, na.rm = TRUE)
      )
    )
  )

Snapshot of the summarized data

Now all we have to do is save the dataframe to SAS data files.

write_xpt(summarised_data, "cars_summarised.sas7bdat")

You can upload the file back to SAS.

SAS interface is displayed highlighting the process of uploading files.

Upload the file back to SAS

Best Practices

  1. Reproducibility: You can document the steps you take when reading and writing SAS data  with R Markdown or Quarto. This is an important aspect of reproducibility.

If you desire to run the workflow on a routine, then you can consider using targets pipeline.

  1. Seek Help: If you require further guidance, don’t hesitate to seek help from the R community or SAS communities. Collaboration can often lead to quicker solutions.

Also, Stack Overflow is a great resource, and it’s quite possible that someone has already faced and shared a solution similar to yours.

Conclusion

Using the haven package to read and write SAS data has eased out much of the struggles in SAS and R interoperability. This guide showcases how to read SAS files and deal with common issues related to that process.

Do you want to get more out of your data with custom analytics and solutions? We’re here to help.

The post appeared first on appsilon.com/blog/.

To leave a comment for the author, please follow the link and comment on their blog: Tag: r - Appsilon | Enterprise R Shiny Dashboards.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)