[This article was first published on   Yet Another Blog in Statistical Computing » S+/R, and kindly contributed to R-bloggers].  (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
            Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
In [1]: import pandas as pd
In [2]: import statsmodels.api as sm
In [3]: data = pd.read_table('/home/liuwensui/Documents/data/csdata.txt')
In [4]: Y = data.LEV_LT3
In [5]: X = sm.add_constant(data[['COLLAT1', 'SIZE1', 'PROF2', 'LIQ', 'IND3A']])
In [6]: # Discrete Dependent Variable Models with Logit Link
In [7]: mod = sm.Logit(Y, X)
In [8]: res = mod.fit()
Optimization terminated successfully.
         Current function value: 882.448249
         Iterations 8
In [9]: print res.summary()
                           Logit Regression Results                           
==============================================================================
Dep. Variable:                LEV_LT3   No. Observations:                 4421
Model:                          Logit   Df Residuals:                     4415
Method:                           MLE   Df Model:                            5
Date:                Sun, 16 Dec 2012   Pseudo R-squ.:                 0.04022
Time:                        23:40:40   Log-Likelihood:                -882.45
converged:                       True   LL-Null:                       -919.42
                                        LLR p-value:                 1.539e-14
==============================================================================
                 coef    std err          z      P>|z|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
COLLAT1        1.2371      0.260      4.756      0.000         0.727     1.747
SIZE1          0.3590      0.037      9.584      0.000         0.286     0.432
PROF2         -3.1431      0.739     -4.254      0.000        -4.591    -1.695
LIQ           -1.3825      0.357     -3.867      0.000        -2.083    -0.682
IND3A          0.5466      0.141      3.867      0.000         0.270     0.824
const         -7.2498      0.567    -12.779      0.000        -8.362    -6.138
==============================================================================
In [10]: # Print Marginal Effects
In [11]: print pd.DataFrame(res.margeff(), index = X.columns[:(len(X.columns) - 1)], columns = ['MargEffects'])
         MargEffects
COLLAT1     0.096447
SIZE1       0.027988
PROF2      -0.245035
LIQ        -0.107778
IND3A       0.042611
In [12]: # Address the same type of model with R by Pyper
In [13]: import pyper as pr
In [14]: r = pr.R(use_pandas = True)
In [15]: r.r_data = data
In [16]: # Indirect Estimation of Discrete Dependent Variable Models
In [17]: r('data <- rbind(cbind(r_data, y = 1, wt = r_data$LEV_LT3), cbind(r_data, y = 0, wt = 1 - r_data$LEV_LT3))')
Out[17]: 'try({data <- rbind(cbind(r_data, y = 1, wt = r_data$LEV_LT3), cbind(r_data, y = 0, wt = 1 - r_data$LEV_LT3))})\n'
In [18]: r('mod <- glm(y ~ COLLAT1 + SIZE1 + PROF2 + LIQ + IND3A, weights = wt, subset = (wt > 0), data = data, family = binomial)')
Out[18]: 'try({mod <- glm(y ~ COLLAT1 + SIZE1 + PROF2 + LIQ + IND3A, weights = wt, subset = (wt > 0), data = data, family = binomial)})\nWarning message:\nIn eval(expr, envir, enclos) : non-integer #successes in a binomial glm!\n'
In [19]: print r('summary(mod)')
try({summary(mod)})
Call:
glm(formula = y ~ COLLAT1 + SIZE1 + PROF2 + LIQ + IND3A, family = binomial, 
    data = data, weights = wt, subset = (wt > 0))
Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.0129  -0.4483  -0.3173  -0.1535   2.5379  
Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -7.24979    0.56734 -12.779  < 2e-16 ***
COLLAT1      1.23715    0.26012   4.756 1.97e-06 ***
SIZE1        0.35901    0.03746   9.584  < 2e-16 ***
PROF2       -3.14313    0.73895  -4.254 2.10e-05 ***
LIQ         -1.38249    0.35749  -3.867  0.00011 ***
IND3A        0.54658    0.14136   3.867  0.00011 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
(Dispersion parameter for binomial family taken to be 1)
    Null deviance: 2692.0  on 5536  degrees of freedom
Residual deviance: 2456.4  on 5531  degrees of freedom
AIC: 1995.4
Number of Fisher Scoring iterations: 6
To leave a comment for the author, please follow the link and comment on their blog:  Yet Another Blog in Statistical Computing » S+/R.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
