Rmagic, A Handy Interface Bridging Python and R

May 31, 2013
By

(This article was first published on Yet Another Blog in Statistical Computing » S+/R, and kindly contributed to R-bloggers)

Rmagic (http://ipython.org/ipython-doc/dev/config/extensions/rmagic.html) is the ipython extension that utilizes rpy2 in the back-end and provides a convenient interface accessing R from ipython. Compared with the generic use of rpy2, the rmagic extension allows users to exchange objects between ipython and R in a more flexible way and to run a single R function or a block of R code conveniently.

Below is an example demonstrating a simple use case how to push a pandas DataFrame object into R, convert it to a R data.frame, and then transfer back to a new pandas DataFrame object again.

In [1]: import pandas as pd

In [2]: # READ DATA INTO PANDAS DATAFRAME

In [3]: pydf1 = pd.read_table('../data/csdata.txt', header = 0)

In [4]: print pydf1.describe()
           LEV_LT3     TAX_NDEB      COLLAT1        SIZE1        PROF2  \
count  4421.000000  4421.000000  4421.000000  4421.000000  4421.000000   
mean      0.090832     0.824537     0.317354    13.510870     0.144593   
std       0.193872     2.884129     0.227150     1.692520     0.110908   
min       0.000000     0.000000     0.000000     7.738052     0.000016   
25%       0.000000     0.349381     0.124094    12.316970     0.072123   
50%       0.000000     0.566577     0.287613    13.539574     0.120344   
75%       0.011689     0.789128     0.472355    14.751119     0.187515   
max       0.998372   102.149483     0.995346    18.586632     1.590201   

           GROWTH2          AGE          LIQ        IND2A        IND3A  \
count  4421.000000  4421.000000  4421.000000  4421.000000  4421.000000   
mean     13.619633    20.366433     0.202813     0.611626     0.190228   
std      36.517739    14.538997     0.233256     0.487435     0.392526   
min     -81.247627     6.000000     0.000000     0.000000     0.000000   
25%      -3.563235    11.000000     0.034834     0.000000     0.000000   
50%       6.164303    17.000000     0.108544     1.000000     0.000000   
75%      21.951632    25.000000     0.291366     1.000000     0.000000   
max     681.354187   210.000000     1.000182     1.000000     1.000000   

             IND4A        IND5A  
count  4421.000000  4421.000000  
mean      0.026917     0.099073  
std       0.161859     0.298793  
min       0.000000     0.000000  
25%       0.000000     0.000000  
50%       0.000000     0.000000  
75%       0.000000     0.000000  
max       1.000000     1.000000  

In [5]: # CONVERT PANDAS DATAFRAME TO R DATA.FRAME

In [6]: %load_ext rmagic

In [7]: col = pydf1.columns

In [8]: %R -i pydf1,col colnames(pydf1) <- unlist(col); print(is.matrix(pydf1))
[1] TRUE

In [9]: %R rdf <- data.frame(pydf1); print(is.data.frame(rdf))
[1] TRUE

In [10]: %R print(summary(rdf))
    LEV_LT3           TAX_NDEB           COLLAT1           SIZE1       
 Min.   :0.00000   Min.   :  0.0000   Min.   :0.0000   Min.   : 7.738  
 1st Qu.:0.00000   1st Qu.:  0.3494   1st Qu.:0.1241   1st Qu.:12.317  
 Median :0.00000   Median :  0.5666   Median :0.2876   Median :13.540  
 Mean   :0.09083   Mean   :  0.8245   Mean   :0.3174   Mean   :13.511  
 3rd Qu.:0.01169   3rd Qu.:  0.7891   3rd Qu.:0.4724   3rd Qu.:14.751  
 Max.   :0.99837   Max.   :102.1495   Max.   :0.9953   Max.   :18.587  
     PROF2              GROWTH2             AGE              LIQ         
 Min.   :0.0000158   Min.   :-81.248   Min.   :  6.00   Min.   :0.00000  
 1st Qu.:0.0721233   1st Qu.: -3.563   1st Qu.: 11.00   1st Qu.:0.03483  
 Median :0.1203435   Median :  6.164   Median : 17.00   Median :0.10854  
 Mean   :0.1445929   Mean   : 13.620   Mean   : 20.37   Mean   :0.20281  
 3rd Qu.:0.1875148   3rd Qu.: 21.952   3rd Qu.: 25.00   3rd Qu.:0.29137  
 Max.   :1.5902009   Max.   :681.354   Max.   :210.00   Max.   :1.00018  
     IND2A            IND3A            IND4A             IND5A        
 Min.   :0.0000   Min.   :0.0000   Min.   :0.00000   Min.   :0.00000  
 1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.00000   1st Qu.:0.00000  
 Median :1.0000   Median :0.0000   Median :0.00000   Median :0.00000  
 Mean   :0.6116   Mean   :0.1902   Mean   :0.02692   Mean   :0.09907  
 3rd Qu.:1.0000   3rd Qu.:0.0000   3rd Qu.:0.00000   3rd Qu.:0.00000  
 Max.   :1.0000   Max.   :1.0000   Max.   :1.00000   Max.   :1.00000  

In [11]: # CONVER R DATA.FRAME BACK TO PANDAS DATAFRAME

In [12]: %R -d rdf

In [13]: pydf2 = pd.DataFrame(rdf)

In [14]: print pydf2.describe()
           LEV_LT3     TAX_NDEB      COLLAT1        SIZE1        PROF2  \
count  4421.000000  4421.000000  4421.000000  4421.000000  4421.000000   
mean      0.090832     0.824537     0.317354    13.510870     0.144593   
std       0.193872     2.884129     0.227150     1.692520     0.110908   
min       0.000000     0.000000     0.000000     7.738052     0.000016   
25%       0.000000     0.349381     0.124094    12.316970     0.072123   
50%       0.000000     0.566577     0.287613    13.539574     0.120344   
75%       0.011689     0.789128     0.472355    14.751119     0.187515   
max       0.998372   102.149483     0.995346    18.586632     1.590201   

           GROWTH2          AGE          LIQ        IND2A        IND3A  \
count  4421.000000  4421.000000  4421.000000  4421.000000  4421.000000   
mean     13.619633    20.366433     0.202813     0.611626     0.190228   
std      36.517739    14.538997     0.233256     0.487435     0.392526   
min     -81.247627     6.000000     0.000000     0.000000     0.000000   
25%      -3.563235    11.000000     0.034834     0.000000     0.000000   
50%       6.164303    17.000000     0.108544     1.000000     0.000000   
75%      21.951632    25.000000     0.291366     1.000000     0.000000   
max     681.354187   210.000000     1.000182     1.000000     1.000000   

             IND4A        IND5A  
count  4421.000000  4421.000000  
mean      0.026917     0.099073  
std       0.161859     0.298793  
min       0.000000     0.000000  
25%       0.000000     0.000000  
50%       0.000000     0.000000  
75%       0.000000     0.000000  
max       1.000000     1.000000

To leave a comment for the author, please follow the link and comment on his blog: Yet Another Blog in Statistical Computing » S+/R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.