Blog Archives

Data Import Efficiency – A Case in R

December 23, 2012
By
Data Import Efficiency – A Case in R

Below is a piece of R snippet comparing the data import efficiencies among CSV, SQLITE, and HDF5. Similar to the case in Python posted yesterday, HDF5 shows the highest efficiency.

Read more »

Removing Records by Duplicate Values in R – An Efficiency Comparison

December 20, 2012
By
Removing Records by Duplicate Values in R – An Efficiency Comparison

After posting “Removing Records by Duplicate Values” yesterday, I had an interesting communication thread with my friend Jeffrey Allard tonight regarding how to code this in R, a combination of order() and duplicated() or sqldf(). Afterward, I did a simple efficiency comparison between two methods as below. The comparison result is pretty self-explanatory. In terms

Read more »

Removing Records by Duplicate Values

December 20, 2012
By
Removing Records by Duplicate Values

Removing records from a data table based on duplicate values in one or more columns is a commonly used but important data cleaning technique. Below shows an example about how to accomplish this task by SAS, R, and Python respectively. SAS Example R Example Python Example

Read more »

Generalized Boosted Regression with A Monotonic Marginal Effect for Each Predictor

December 18, 2012
By
Generalized Boosted Regression with A Monotonic Marginal Effect for Each Predictor

In the practice of risk modeling, it is sometimes mandatory to maintain a monotonic relationship between the response and each predictor. Below is a demonstration showing how to develop a generalized boosted regression with a monotonic marginal effect for each predictor. Plot of Variable Importance Plot of Monotonic Marginal Effects

Read more »

Fractional Logit Model with Python

December 16, 2012
By
Fractional Logit Model with Python

Read more »

Exchange Data between Python and R with SQLite

December 2, 2012
By
Exchange Data between Python and R with SQLite

SQLite is a light-weight database with zero-configuration. Being fast, reliable, and simple, SQLite is a good choice to store / query large data, e.g. terabytes, and is well supported by both Python and R.

Read more »

Another Way to Access R from Python – PypeR

November 29, 2012
By
Another Way to Access R from Python – PypeR

Different from RPy2, PypeR provides another simple way to access R from Python through pipes (http://www.jstatsoft.org/v35/c02/paper). This handy feature enables data analysts to do the data munging with python and the statistical analysis with R by passing objects interactively between two computing systems. Below is a simple demonstration on how to call R within Python

Read more »

Run R Code Within Python On The Fly

November 24, 2012
By
Run R Code Within Python On The Fly

Below is an example showing how to run R code within python, which is an extremely attractive feature for hardcore R programmers.

Read more »

A Light Touch on RPy2

November 23, 2012
By
A Light Touch on RPy2

For a statistical analyst, the first step to start a data analysis project is to import the data into the program and then to screen the descriptive statistics of the data. In python, we can easily do so with pandas package. Tonight, I’d like to add some spice to my python learning experience and do

Read more »

Download Stock Price Online with R

October 11, 2012
By
Download Stock Price Online with R

Read more »