Performing Logistic Regression in R and SAS
Want to share your content on Rbloggers? click here if you have a blog, or here if you don't.
Introduction
My statistics education focused a lot on normal linear leastsquares regression, and I was even told by a professor in an introductory statistics class that 95% of statistical consulting can be done with knowledge learned up to and including a course in linear regression. Unfortunately, that advice has turned out to vastly underestimate the variety and depth of problems that I have encountered in statistical consulting, and the emphasis on linear regression has not paid dividends in my statistics career so far. Wisdom from veteran statisticians and my own experience combine to suggest that logistic regression is actually much more commonly used in industry than linear regression. I have already started a series of short lessons on binary classification in my Statistics Lesson of the Day and Machine Learning Lesson of the Day. In this post, I will show how to perform logistic regression in both R and SAS. I will discuss how to interpret the results in a later post.
The Data Set
The data set that I will use is slightly modified from Michael Brannick’s web page that explains logistic regression. I copied and pasted the data from his web page into Excel, modified the data to create a new data set, then saved it as an Excel spreadsheet called heart attack.xlsx.
 In R, I imported it using the “XLConnect” package.
 In SAS, I imported it using PROC IMPORT.
This data set has 3 variables (I have renamed them for convenience in my R programming).
 ha2 – Whether or not a patient had a second heart attack. If ha2 = 1, then the patient had a second heart attack; otherwise, if ha2 = 0, then the patient did not have a second heart attack. This is the response variable.
 treatment – Whether or not the patient completed an anger control treatment program.
 anxiety – A continuous variable that scores the patient’s anxiety level. A higher score denotes higher anxiety.
Read the rest of this post to get the full scripts and view the full outputs of this logistic regression model in both R and SAS!
R Script for Implementing Logistic Regression
##### Interpreting the Results of a Logistic Regression Model in R ##### By Eric Cai  The Chemical Statistician # clear all variables in the workspace rm(list=ls(all=TRUE)) library(XLConnect) heart.attack < readWorksheet(loadWorkbook("YOUR DIRECTORY PATH HERE/heart attack.xlsx"), sheet = 1) # perform logistic regression and assign the output object to the variable "logistic.ha" logistic.ha = glm(ha2 ~ treatment + anxiety, family = binomial, data = heart.attack) # use the summary() function to view the results of the model summary(logistic.ha)
R Output of Logistic Regression Model
Here is the output from summary(logistic.ha).
> summary(logistic.ha) Call: glm(formula = ha2 ~ treatment + anxiety, family = binomial, data = heart.attack) Deviance Residuals: Min 1Q Median 3Q Max 1.52106 0.68746 0.00424 0.70625 1.88960 Coefficients: Estimate Std. Error z value Pr(>z) (Intercept) 6.36347 3.21362 1.980 0.0477 * treatment 1.02411 1.17101 0.875 0.3818 anxiety 0.11904 0.05497 2.165 0.0304 *  Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 27.726 on 19 degrees of freedom Residual deviance: 18.820 on 17 degrees of freedom AIC: 24.82 Number of Fisher Scoring iterations: 4
SAS Script for Implementing Logistic Regression
Here is the SAS script for performing the same logistic regression analysis. The code at the beginning is useful for clearing the log, the output file and the results viewer.
/* Useful Options For Every SAS Program  With Some Tips Learned From Dr. Jerry Brunner by Eric Cai  The Chemical Statistician */ dm 'cle log; cle out;'; ods html close; ods html; dm 'odsresults; clear'; ods listing close; ods listing; options noovp nodate linesize = 105 formdlim = '' pageno = min; title 'Worcester Heart Attack Study'; * import the data; proc import datafile = "INSERT YOUR DIRECTORY PATH HEREheart attack.xlsx" dbms = xlsx out = heart_attack_raw replace; run; * create formats for the categorical variables; proc format; value ha 1 = 'Yes' 0 = 'No'; value treatment 1 = 'Yes' 0 = 'No'; run; * add formats and labels to variables in the data set; data heart_attack; set heart_attack_raw; format ha2 ha. treatment treatment.; label ha2 = '2nd Heart Attack' treatment = 'Received Treatment for Anger' anxiety = 'Anxiety Score'; run; * export the logistic regression output to a PDF file; ods graphics on; ods pdf file = "INSERT YOUR DIRECTORY PATH HERESAS output  logistic regression of heart attacks.pdf"; proc logistic data = heart_attack; class treatment(ref = 'No') / param = ref; model ha2(event = 'Yes') = treatment anxiety / parmlabel; run; quit; ods pdf close;
SAS Output of Logistic Regression Model
Here is the output as seen in the results viewer. As you can see in my above code, I also used ods graphics and ods pdf to export the output into a PDF file for easy viewing and reporting.
Worcester Heart Attack Study 
Model Information  

Data Set  WORK.HEART_ATTACK  
Response Variable  ha2  2nd Heart Attack 
Number of Response Levels  2  
Model  binary logit  
Optimization Technique  Fisher’s scoring 
Number of Observations Read  40 

Number of Observations Used  40 
Response Profile  

Ordered Value 
ha2  Total Frequency 
1  No  20 
2  Yes  20 
Class Level Information  

Class  Value  Design Variables 
treatment  No  0 
Yes  1 
Model Convergence Status 

Convergence criterion (GCONV=1E8) satisfied. 
Model Fit Statistics  

Criterion  Intercept Only  Intercept and Covariates 
AIC  57.452  35.753 
SC  59.141  40.820 
2 Log L  55.452  29.753 
Testing Global Null Hypothesis: BETA=0  

Test  ChiSquare  DF  Pr > ChiSq 
Likelihood Ratio  25.6988  2  <.0001 
Score  20.3024  2  <.0001 
Wald  11.3897  2  0.0034 
Type 3 Analysis of Effects  

Effect  DF  Wald ChiSquare 
Pr > ChiSq 
treatment  1  7.3879  0.0066 
anxiety  1  8.4033  0.0037 
Analysis of Maximum Likelihood Estimates  

Parameter  DF  Estimate  Standard Error 
Wald ChiSquare 
Pr > ChiSq  Label  
Intercept  1  6.3834  2.5048  6.4949  0.0108  Intercept: ha2=No  
treatment  Yes  1  2.7331  1.0055  7.3879  0.0066  Received Treatment for Anger Yes 
anxiety  1  0.1397  0.0482  8.4033  0.0037  Anxiety Score 
Odds Ratio Estimates  

Effect  Point Estimate  95% Wald Confidence Limits 

treatment Yes vs No  0.065  0.009  0.467 
anxiety  1.150  1.046  1.264 
Association of Predicted Probabilities and Observed Responses 


Percent Concordant  89.5  Somers’ D  0.823 
Percent Discordant  7.3  Gamma  0.850 
Percent Tied  3.3  Taua  0.422 
Pairs  400  c  0.911 
Filed under: Applied Statistics, Biostatistics, Categorical Data Analysis, R programming, Statistics, Tutorials Tagged: applied statistics, binary classification, deviance residual, deviance residuals, Excel, fisher scoring, fisher scoring algorithm, logistic regression, null deviance, ods graphics, ods pdf, proc import, proc logistic, R, R programming, regression, residual deviance, SAS, sas programming, statistics
Rbloggers.com offers daily email updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/datascience job.
Want to share your content on Rbloggers? click here if you have a blog, or here if you don't.