Efficiency Comparison among 4 Methods above
Below is a piece of R snippet comparing the data import efficiencies among CSV, SQLITE, and HDF5. Similar to the case in Python posted yesterday, HDF5 shows the highest efficiency.
After posting “Removing Records by Duplicate Values” yesterday, I had an interesting communication thread with my friend Jeffrey Allard tonight regarding how to code this in R, a combination of order() and duplicated() or sqldf(). Afterward, I did a simple efficiency comparison between two methods as below. The comparison result is pretty self-explanatory. In terms
Removing records from a data table based on duplicate values in one or more columns is a commonly used but important data cleaning technique. Below shows an example about how to accomplish this task by SAS, R, and Python respectively. SAS Example R Example Python Example
In the practice of risk modeling, it is sometimes mandatory to maintain a monotonic relationship between the response and each predictor. Below is a demonstration showing how to develop a generalized boosted regression with a monotonic marginal effect for each predictor. Plot of Variable Importance Plot of Monotonic Marginal Effects
SQLite is a light-weight database with zero-configuration. Being fast, reliable, and simple, SQLite is a good choice to store / query large data, e.g. terabytes, and is well supported by both Python and R.
Different from RPy2, PypeR provides another simple way to access R from Python through pipes (http://www.jstatsoft.org/v35/c02/paper). This handy feature enables data analysts to do the data munging with python and the statistical analysis with R by passing objects interactively between two computing systems. Below is a simple demonstration on how to call R within Python
Below is an example showing how to run R code within python, which is an extremely attractive feature for hardcore R programmers.
For a statistical analyst, the first step to start a data analysis project is to import the data into the program and then to screen the descriptive statistics of the data. In python, we can easily do so with pandas package. Tonight, I’d like to add some spice to my python learning experience and do