Below is a piece of R snippet comparing the data import efficiencies among CSV, SQLITE, and HDF5. Similar to the case in Python posted yesterday, HDF5 shows the highest efficiency.

After posting “Removing Records by Duplicate Values” yesterday, I had an interesting communication thread with my friend Jeffrey Allard tonight regarding how to code this in R, a combination of order() and duplicated() or sqldf(). Afterward, I did a simple efficiency comparison between two methods as below. The comparison result is pretty self-explanatory. In terms

In the practice of risk modeling, it is sometimes mandatory to maintain a monotonic relationship between the response and each predictor. Below is a demonstration showing how to develop a generalized boosted regression with a monotonic marginal effect for each predictor. Plot of Variable Importance Plot of Monotonic Marginal Effects

Different from RPy2, PypeR provides another simple way to access R from Python through pipes (http://www.jstatsoft.org/v35/c02/paper). This handy feature enables data analysts to do the data munging with python and the statistical analysis with R by passing objects interactively between two computing systems. Below is a simple demonstration on how to call R within Python

For a statistical analyst, the first step to start a data analysis project is to import the data into the program and then to screen the descriptive statistics of the data. In python, we can easily do so with pandas package. Tonight, I’d like to add some spice to my python learning experience and do