[This article was first published on Yet Another Blog in Statistical Computing » S+/R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
For a statistical analyst, the first step to start a data analysis project is to import the data into the program and then to screen the descriptive statistics of the data. In python, we can easily do so with pandas package.
In [1]: import pandas as pd
In [2]: data = pd.read_table("/home/liuwensui/Documents/data/csdata.txt", header = 0)
In [3]: pd.set_printoptions(precision = 5)
In [4]: print data.describe().to_string()
LEV_LT3 TAX_NDEB COLLAT1 SIZE1 PROF2 GROWTH2 AGE LIQ IND2A IND3A IND4A IND5A
count 4421.0000 4421.0000 4421.0000 4421.0000 4421.0000 4421.0000 4421.0000 4421.0000 4421.0000 4421.0000 4421.0000 4421.0000
mean 0.0908 0.8245 0.3174 13.5109 0.1446 13.6196 20.3664 0.2028 0.6116 0.1902 0.0269 0.0991
std 0.1939 2.8841 0.2272 1.6925 0.1109 36.5177 14.5390 0.2333 0.4874 0.3925 0.1619 0.2988
min 0.0000 0.0000 0.0000 7.7381 0.0000 -81.2476 6.0000 0.0000 0.0000 0.0000 0.0000 0.0000
25% 0.0000 0.3494 0.1241 12.3170 0.0721 -3.5632 11.0000 0.0348 0.0000 0.0000 0.0000 0.0000
50% 0.0000 0.5666 0.2876 13.5396 0.1203 6.1643 17.0000 0.1085 1.0000 0.0000 0.0000 0.0000
75% 0.0117 0.7891 0.4724 14.7511 0.1875 21.9516 25.0000 0.2914 1.0000 0.0000 0.0000 0.0000
max 0.9984 102.1495 0.9953 18.5866 1.5902 681.3542 210.0000 1.0002 1.0000 1.0000 1.0000 1.0000
Tonight, I’d like to add some spice to my python learning experience and do the work in a different flavor with rpy2 package, which allows me to call R functions from python.
In [5]: import rpy2.robjects as ro
In [6]: rdata = ro.packages.importr('utils').read_table("/home/liuwensui/Documents/data/csdata.txt", header = True)
In [7]: print ro.r.summary(rdata)
LEV_LT3 TAX_NDEB COLLAT1 SIZE1
Min. :0.00000 Min. : 0.0000 Min. :0.0000 Min. : 7.738
1st Qu.:0.00000 1st Qu.: 0.3494 1st Qu.:0.1241 1st Qu.:12.317
Median :0.00000 Median : 0.5666 Median :0.2876 Median :13.540
Mean :0.09083 Mean : 0.8245 Mean :0.3174 Mean :13.511
3rd Qu.:0.01169 3rd Qu.: 0.7891 3rd Qu.:0.4724 3rd Qu.:14.751
Max. :0.99837 Max. :102.1495 Max. :0.9953 Max. :18.587
PROF2 GROWTH2 AGE LIQ
Min. :0.0000158 Min. :-81.248 Min. : 6.00 Min. :0.00000
1st Qu.:0.0721233 1st Qu.: -3.563 1st Qu.: 11.00 1st Qu.:0.03483
Median :0.1203435 Median : 6.164 Median : 17.00 Median :0.10854
Mean :0.1445929 Mean : 13.620 Mean : 20.37 Mean :0.20281
3rd Qu.:0.1875148 3rd Qu.: 21.952 3rd Qu.: 25.00 3rd Qu.:0.29137
Max. :1.5902009 Max. :681.354 Max. :210.00 Max. :1.00018
IND2A IND3A IND4A IND5A
Min. :0.0000 Min. :0.0000 Min. :0.00000 Min. :0.00000
1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.00000 1st Qu.:0.00000
Median :1.0000 Median :0.0000 Median :0.00000 Median :0.00000
Mean :0.6116 Mean :0.1902 Mean :0.02692 Mean :0.09907
3rd Qu.:1.0000 3rd Qu.:0.0000 3rd Qu.:0.00000 3rd Qu.:0.00000
Max. :1.0000 Max. :1.0000 Max. :1.00000 Max. :1.00000
As shown above, the similar analysis can be conducted by calling R functions with python. This feature enables us to extract and process the data effectively with python without losing the graphical and statistical functionality of R.
To leave a comment for the author, please follow the link and comment on their blog: Yet Another Blog in Statistical Computing » S+/R.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
