Fractional Logit Model with Python

December 16, 2012
By

(This article was first published on Yet Another Blog in Statistical Computing » S+/R, and kindly contributed to R-bloggers)

In [1]: import pandas as pd

In [2]: import statsmodels.api as sm

In [3]: data = pd.read_table('/home/liuwensui/Documents/data/csdata.txt')

In [4]: Y = data.LEV_LT3

In [5]: X = sm.add_constant(data[['COLLAT1', 'SIZE1', 'PROF2', 'LIQ', 'IND3A']])

In [6]: # Discrete Dependent Variable Models with Logit Link

In [7]: mod = sm.Logit(Y, X)

In [8]: res = mod.fit()
Optimization terminated successfully.
         Current function value: 882.448249
         Iterations 8

In [9]: print res.summary()
                           Logit Regression Results                           
==============================================================================
Dep. Variable:                LEV_LT3   No. Observations:                 4421
Model:                          Logit   Df Residuals:                     4415
Method:                           MLE   Df Model:                            5
Date:                Sun, 16 Dec 2012   Pseudo R-squ.:                 0.04022
Time:                        23:40:40   Log-Likelihood:                -882.45
converged:                       True   LL-Null:                       -919.42
                                        LLR p-value:                 1.539e-14
==============================================================================
                 coef    std err          z      P>|z|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
COLLAT1        1.2371      0.260      4.756      0.000         0.727     1.747
SIZE1          0.3590      0.037      9.584      0.000         0.286     0.432
PROF2         -3.1431      0.739     -4.254      0.000        -4.591    -1.695
LIQ           -1.3825      0.357     -3.867      0.000        -2.083    -0.682
IND3A          0.5466      0.141      3.867      0.000         0.270     0.824
const         -7.2498      0.567    -12.779      0.000        -8.362    -6.138
==============================================================================

In [10]: # Print Marginal Effects

In [11]: print pd.DataFrame(res.margeff(), index = X.columns[:(len(X.columns) - 1)], columns = ['MargEffects'])
         MargEffects
COLLAT1     0.096447
SIZE1       0.027988
PROF2      -0.245035
LIQ        -0.107778
IND3A       0.042611

In [12]: # Address the same type of model with R by Pyper

In [13]: import pyper as pr

In [14]: r = pr.R(use_pandas = True)

In [15]: r.r_data = data

In [16]: # Indirect Estimation of Discrete Dependent Variable Models

In [17]: r('data <- rbind(cbind(r_data, y = 1, wt = r_data$LEV_LT3), cbind(r_data, y = 0, wt = 1 - r_data$LEV_LT3))')
Out[17]: 'try({data <- rbind(cbind(r_data, y = 1, wt = r_data$LEV_LT3), cbind(r_data, y = 0, wt = 1 - r_data$LEV_LT3))})\n'

In [18]: r('mod <- glm(y ~ COLLAT1 + SIZE1 + PROF2 + LIQ + IND3A, weights = wt, subset = (wt > 0), data = data, family = binomial)')
Out[18]: 'try({mod <- glm(y ~ COLLAT1 + SIZE1 + PROF2 + LIQ + IND3A, weights = wt, subset = (wt > 0), data = data, family = binomial)})\nWarning message:\nIn eval(expr, envir, enclos) : non-integer #successes in a binomial glm!\n'

In [19]: print r('summary(mod)')
try({summary(mod)})

Call:
glm(formula = y ~ COLLAT1 + SIZE1 + PROF2 + LIQ + IND3A, family = binomial, 
    data = data, weights = wt, subset = (wt > 0))

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.0129  -0.4483  -0.3173  -0.1535   2.5379  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -7.24979    0.56734 -12.779  < 2e-16 ***
COLLAT1      1.23715    0.26012   4.756 1.97e-06 ***
SIZE1        0.35901    0.03746   9.584  < 2e-16 ***
PROF2       -3.14313    0.73895  -4.254 2.10e-05 ***
LIQ         -1.38249    0.35749  -3.867  0.00011 ***
IND3A        0.54658    0.14136   3.867  0.00011 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 2692.0  on 5536  degrees of freedom
Residual deviance: 2456.4  on 5531  degrees of freedom
AIC: 1995.4

Number of Fisher Scoring iterations: 6

To leave a comment for the author, please follow the link and comment on their blog: Yet Another Blog in Statistical Computing » S+/R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)