For a long time I tracked a discussion on LinkedIn that consisted of various opinions about using SAS vs R. Some people can take this very personal. Recently there was an interesting post at the DataCamp blog addressing this topic. They also provided an interesting infographic making some comparisons between SAS and R as well as SPSS. Other popular debates also include python vis-a-vis SAS and R. (By the way, it is possible to integrate all three on the SAS platform and you can also run R via the open source integration node in SAS Enterprise Miner 13.1).
Aside: For older versions of SAS EM-can you drop in a code node and call R via PROC IML?
Anyway, getting back to the article, I tend to agree with this one point:
“While these debates are a good thing for the community and the programming language as a whole, they unfortunately also have a negative effect on those individuals that are just in the beginning of their data analytics career. Biased opinions on all sides of the table make it difficult for new data analysts to see the forest for the trees when choosing a statistical programming language.“
While I agree with this notion, I want to reflect for a minute on the concept of a statistical language. If you think of SAS as just a statistical language, then perhaps these kinds of comparisons and discussions make sense, but for a data scientist, I think one’s view of analtyics should transcend just a language. When we think of an overall analytical solution there is a lot to consider, from how the data is generated, how it is captured and warehoused, how it is extracted and cleaned and accessed by whatever programming tool(s), how it is analyzed, and ultimately, how do we operationalize the solution so that it can be consumed by business users.
So to me the relevant question is not, which programming language is preferred by data scientists, or which program is better for analtyics; but perhaps what is the best analytical solutions platform for solving the problems at hand.