Another Way to Access R from Python – PypeR

[This article was first published on Yet Another Blog in Statistical Computing » S+/R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Different from RPy2, PypeR provides another simple way to access R from Python through pipes (http://www.jstatsoft.org/v35/c02/paper). This handy feature enables data analysts to do the data munging with python and the statistical analysis with R by passing objects interactively between two computing systems.

Below is a simple demonstration on how to call R within Python through RypeR, estimate a Beta regression, and then return the model prediction from R back to Python.

In [1]: # LOAD PYTHON PACKAGES

In [2]: import pandas as pd

In [3]: import pyper as pr

In [4]: # READ DATA

In [5]: data = pd.read_table("/home/liuwensui/Documents/data/csdata.txt", header = 0)

In [6]: # CREATE A R INSTANCE WITH PYPER

In [7]: r = pr.R(use_pandas = True)

In [8]: # PASS DATA FROM PYTHON TO R

In [9]: r.assign("rdata", data)

In [10]: # SHOW DATA SUMMARY

In [11]: print r("summary(rdata)")
try({summary(rdata)})
    LEV_LT3           TAX_NDEB           COLLAT1           SIZE1       
 Min.   :0.00000   Min.   :  0.0000   Min.   :0.0000   Min.   : 7.738  
 1st Qu.:0.00000   1st Qu.:  0.3494   1st Qu.:0.1241   1st Qu.:12.317  
 Median :0.00000   Median :  0.5666   Median :0.2876   Median :13.540  
 Mean   :0.09083   Mean   :  0.8245   Mean   :0.3174   Mean   :13.511  
 3rd Qu.:0.01169   3rd Qu.:  0.7891   3rd Qu.:0.4724   3rd Qu.:14.751  
 Max.   :0.99837   Max.   :102.1495   Max.   :0.9953   Max.   :18.587  
     PROF2              GROWTH2             AGE              LIQ         
 Min.   :0.0000158   Min.   :-81.248   Min.   :  6.00   Min.   :0.00000  
 1st Qu.:0.0721233   1st Qu.: -3.563   1st Qu.: 11.00   1st Qu.:0.03483  
 Median :0.1203435   Median :  6.164   Median : 17.00   Median :0.10854  
 Mean   :0.1445929   Mean   : 13.620   Mean   : 20.37   Mean   :0.20281  
 3rd Qu.:0.1875148   3rd Qu.: 21.952   3rd Qu.: 25.00   3rd Qu.:0.29137  
 Max.   :1.5902009   Max.   :681.354   Max.   :210.00   Max.   :1.00018  
     IND2A            IND3A            IND4A             IND5A        
 Min.   :0.0000   Min.   :0.0000   Min.   :0.00000   Min.   :0.00000  
 1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.00000   1st Qu.:0.00000  
 Median :1.0000   Median :0.0000   Median :0.00000   Median :0.00000  
 Mean   :0.6116   Mean   :0.1902   Mean   :0.02692   Mean   :0.09907  
 3rd Qu.:1.0000   3rd Qu.:0.0000   3rd Qu.:0.00000   3rd Qu.:0.00000  
 Max.   :1.0000   Max.   :1.0000   Max.   :1.00000   Max.   :1.00000  


In [12]: # LOAD R PACKAGE

In [13]: r("library(betareg)")
Out[13]: 'try({library(betareg)})\nLoading required package: Formula\n'

In [14]: # ESTIMATE A BETA REGRESSION

In [15]: r("m <- betareg(LEV_LT3 ~ SIZE1 + PROF2 + GROWTH2 + AGE + IND3A, data = rdata, subset = LEV_LT3 > 0)")
Out[15]: 'try({m <- betareg(LEV_LT3 ~ SIZE1 + PROF2 + GROWTH2 + AGE + IND3A, data = rdata, subset = LEV_LT3 > 0)})\n'

In [16]: # OUTPUT MODEL SUMMARY

In [17]: print r("summary(m)")
try({summary(m)})

Call:
betareg(formula = LEV_LT3 ~ SIZE1 + PROF2 + GROWTH2 + AGE + IND3A, data = rdata, 
    subset = LEV_LT3 > 0)

Standardized weighted residuals 2:
    Min      1Q  Median      3Q     Max 
-7.2802 -0.5194  0.0777  0.6037  5.8777 

Coefficients (mean model with logit link):
             Estimate Std. Error z value Pr(>|z|)    
(Intercept)  1.229773   0.312990   3.929 8.53e-05 ***
SIZE1       -0.105009   0.021211  -4.951 7.39e-07 ***
PROF2       -2.414794   0.377271  -6.401 1.55e-10 ***
GROWTH2      0.003306   0.001043   3.169  0.00153 ** 
AGE         -0.004999   0.001795  -2.786  0.00534 ** 
IND3A        0.688314   0.074069   9.293  < 2e-16 ***

Phi coefficients (precision model with identity link):
      Estimate Std. Error z value Pr(>|z|)    
(phi)   3.9362     0.1528   25.77   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Type of estimator: ML (maximum likelihood)
Log-likelihood: 266.7 on 7 Df
Pseudo R-squared: 0.1468
Number of iterations: 25 (BFGS) + 2 (Fisher scoring) 


In [18]: # CALCULATE MODEL PREDICTION

In [19]: r("beta_fit <- predict(m, link = 'response')")
Out[19]: "try({beta_fit <- predict(m, link = 'response')})\n"

In [20]: # SHOW PREDICTION SUMMARY IN R

In [21]: print r("summary(beta_fit)")
try({summary(beta_fit)})
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.1634  0.3069  0.3465  0.3657  0.4007  0.6695 


In [22]: # PASS DATA FROM R TO PYTHON

In [23]: pydata = pd.DataFrame(r.get("beta_fit"), columns = ["y_hat"])

In [24]: # SHOW PREDICTION SUMMARY IN PYTHON

In [25]: pydata.y_hat.describe()
Out[25]: 
count    1116.000000
mean        0.365675
std         0.089804
min         0.163388
25%         0.306897
50%         0.346483
75%         0.400656
max         0.669489

To leave a comment for the author, please follow the link and comment on their blog: Yet Another Blog in Statistical Computing » S+/R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)