SAS to R Migration for Financial Data: Lessons and Examples

November 14, 2016
By

(This article was first published on Revolutions, and kindly contributed to R-bloggers)

by Lixun Zhang (Data Scientist), Ye Xing (Senior Data Scientist) and Tao Wu (Principal Data Scientist Manager), all at Microsoft

Editor's Note: To learn more about migrating from SAS to R, there will be a live webinar presented by Lixun and Ye tomorrow (Tuesday, November 15). Register to attend the webinar here.

R has been gaining in popularity among data professionals in recent years in industries such as financial services, as shown for example in this survey from executive search firm Burtch Works. In this blog post we share some key considerations in migrating SAS to R for a financial services workload. More specifically, we will focus on the data manipulation aspects of the migration.

One of the most important differences between SAS and R is how data are processed. Take the process of calculating the sum of two variables as an example, which is shown by the SAS and R code below.

SAStoR

SAS processes data row by row by using an implied loop in Data Step. The following graph shows how it execute the operation on a dataset with 3 rows. It starts by calculating the sum for row 1, write the result of row 1 to output table, then do the calculation for row 2 and repeat this till the end of the dataset. Assuming the data have been sorted by the variable "x", SAS also records that row 1 is the first occurrence of x = 10 and row 2 is the last occurrence of x = 10 by the "first." and "last." statement, respectively. This can be useful in situations where only the first or last occurrence should be kept in the output dataset.

Sas processes data

R, on the other hand, applies functions at the column level by processing all rows at the same time, as shown in the following graph. Since R processes data by column, it does not have a corresponding function for SAS "first." or "last."

R processes data

Because of the differences between SAS and R, one of the best approaches for converting SAS programs to R is to first understand what a block of SAS programs is doing and then rewrite the code in R. To illustrate this, we summarized several scenarios and published them into Cortana Intelligence Gallery as 3 Jupyter notebooks so that you can test out the R code in Azure Machine Learning Studio. These notebooks cover some common business scenarios in financial industry such as counting delinquencies by account and calculating total expense by account. Some important technical concepts such as SAS “retain”, “first.”, and “last.” statements and R’s apply() and sapply() functions are demonstrated in these samples.

To run the notebooks, you can start by clicking on SAS to R Tutorial and then click on “Open in Studio.”

Cortana Intelligence Gallery: SAS to R Tutorial Part 1

To leave a comment for the author, please follow the link and comment on their blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)