Run R Code Within Python On The Fly

November 24, 2012
By

(This article was first published on Yet Another Blog in Statistical Computing » S+/R, and kindly contributed to R-bloggers)

Below is an example showing how to run R code within python, which is an extremely attractive feature for hardcore R programmers.

In [1]: import rpy2.robjects as ro

In [2]: _null_ = ro.r('data <- read.table("/home/liuwensui/data/credit_count.txt", header = TRUE, sep = ",")')

In [3]: print ro.r('str(data)')
'data.frame':	13444 obs. of  14 variables:
 $ CARDHLDR: int  0 0 1 1 1 1 1 1 1 1 ...
 $ DEFAULT : int  0 0 0 0 0 0 0 0 0 0 ...
 $ AGE     : num  27.2 40.8 37.7 42.5 21.3 ...
 $ ACADMOS : int  4 111 54 60 8 78 25 6 20 162 ...
 $ ADEPCNT : int  0 3 3 3 0 1 1 0 3 7 ...
 $ MAJORDRG: int  0 0 0 0 0 0 0 0 0 0 ...
 $ MINORDRG: int  0 0 0 0 0 0 0 0 0 0 ...
 $ OWNRENT : int  0 1 1 1 0 0 1 0 0 1 ...
 $ INCOME  : num  1200 4000 3667 2000 2917 ...
 $ SELFEMPL: int  0 0 0 0 0 0 0 0 0 0 ...
 $ INCPER  : num  18000 13500 11300 17250 35000 ...
 $ EXP_INC : num  0.000667 0.000222 0.03327 0.048427 0.016523 ...
 $ SPENDING: num  NA NA 122 96.9 48.2 ...
 $ LOGSPEND: num  NA NA 4.8 4.57 3.88 ...
NULL

In [4]: _null_ = ro.r('sample <- data[data$CARDHLDR == 1,]')

In [5]: print ro.r('summary(sample)')
    CARDHLDR    DEFAULT             AGE           ACADMOS         ADEPCNT      
 Min.   :1   Min.   :0.00000   Min.   : 0.00   Min.   :  0.0   Min.   :0.0000  
 1st Qu.:1   1st Qu.:0.00000   1st Qu.:25.75   1st Qu.: 12.0   1st Qu.:0.0000  
 Median :1   Median :0.00000   Median :31.67   Median : 30.0   Median :0.0000  
 Mean   :1   Mean   :0.09487   Mean   :33.67   Mean   : 55.9   Mean   :0.9904  
 3rd Qu.:1   3rd Qu.:0.00000   3rd Qu.:39.75   3rd Qu.: 72.0   3rd Qu.:2.0000  
 Max.   :1   Max.   :1.00000   Max.   :88.67   Max.   :564.0   Max.   :9.0000  
    MAJORDRG         MINORDRG         OWNRENT           INCOME    
 Min.   :0.0000   Min.   :0.0000   Min.   :0.0000   Min.   :  50  
 1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:1750  
 Median :0.0000   Median :0.0000   Median :0.0000   Median :2292  
 Mean   :0.1433   Mean   :0.2207   Mean   :0.4791   Mean   :2606  
 3rd Qu.:0.0000   3rd Qu.:0.0000   3rd Qu.:1.0000   3rd Qu.:3042  
 Max.   :6.0000   Max.   :7.0000   Max.   :1.0000   Max.   :8333  
    SELFEMPL           INCPER          EXP_INC            SPENDING       
 Min.   :0.00000   Min.   :   700   Min.   :0.000096   Min.   :   0.111  
 1st Qu.:0.00000   1st Qu.: 12900   1st Qu.:0.025998   1st Qu.:  58.753  
 Median :0.00000   Median : 20000   Median :0.058957   Median : 139.992  
 Mean   :0.05362   Mean   : 22581   Mean   :0.090744   Mean   : 226.983  
 3rd Qu.:0.00000   3rd Qu.: 28337   3rd Qu.:0.116123   3rd Qu.: 284.440  
 Max.   :1.00000   Max.   :150000   Max.   :2.037728   Max.   :4810.309  
    LOGSPEND     
 Min.   :-2.197  
 1st Qu.: 4.073  
 Median : 4.942  
 Mean   : 4.729  
 3rd Qu.: 5.651  
 Max.   : 8.479  

In [6]: print ro.r('summary(glm(DEFAULT ~ MAJORDRG + MINORDRG + OWNRENT + INCOME, data = sample, family = binomial))')

Call:
glm(formula = DEFAULT ~ MAJORDRG + MINORDRG + OWNRENT + INCOME, 
    family = binomial, data = sample)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-0.9587  -0.5003  -0.4351  -0.3305   3.1928  

Coefficients:
              Estimate Std. Error z value Pr(>|z|)    
(Intercept) -1.204e+00  9.084e-02 -13.259  < 2e-16 ***
MAJORDRG     2.031e-01  6.926e-02   2.933  0.00336 ** 
MINORDRG     2.027e-01  4.798e-02   4.225 2.38e-05 ***
OWNRENT     -2.012e-01  7.163e-02  -2.809  0.00496 ** 
INCOME      -4.422e-04  4.044e-05 -10.937  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 6586.1  on 10498  degrees of freedom
Residual deviance: 6376.2  on 10494  degrees of freedom
AIC: 6386.2

Number of Fisher Scoring iterations: 6

To leave a comment for the author, please follow the link and comment on their blog: Yet Another Blog in Statistical Computing » S+/R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)