Blog Archives

Row Search in Parallel

September 28, 2014
By
Row Search in Parallel

I’ve been always wondering whether the efficiency of row search can be improved if the whole data.frame is splitted into chunks and then the row search is conducted within each chunk in parallel. In the R code below, a comparison is done between the standard row search and the parallel row search with the FOREACH

Read more »

Chain Operations: An Interesting Feature in dplyr Package

July 28, 2014
By
Chain Operations: An Interesting Feature in dplyr Package

Read more »

Efficiency of Importing Large CSV Files in R

February 10, 2014
By
Efficiency of Importing Large CSV Files in R

Read more »

Julia and SQLite

February 8, 2014
By
Julia and SQLite

Similar to R and Pandas in Python, Julia provides a simple yet efficient interface with SQLite database. In addition, it is extremely handy to use sqldf() function, which is almost identical to the sqldf package in R, in SQLite package for data munging.

Read more »

Simplex Model in R

February 2, 2014
By
Simplex Model in R

R CODE R OUTPUT SAS CODE & OUTPUT FOR COMPARISON

Read more »

rPython – R Interface to Python

October 13, 2013
By
rPython – R Interface to Python

Read more »

Generate and Retrieve Many Objects with Sequential Names

September 8, 2013
By
Generate and Retrieve Many Objects with Sequential Names

While coding ensemble methods in data mining with R, e.g. bagging, we often need to generate many data and models objects with sequential names. Below is a quick example how to use assign() function to generate many prediction objects on the fly and then retrieve these predictions with mget() to do the model averaging.

Read more »

Prototyping Multinomial Logit with R

August 21, 2013
By
Prototyping Multinomial Logit with R

Recently, I am working on a new modeling proposal based on the competing risk and need to prototype multinomial logit models with R. There are R packages implementing multinomial logit models that I’ve tested, namely nnet and vgam. Model outputs with iris data are shown below. However, in my view, above methods are not flexible

Read more »

GRNN and PNN

June 23, 2013
By
GRNN and PNN

From the technical prospective, people usually would choose GRNN (general regression neural network) to do the function approximation for the continuous response variable and use PNN (probabilistic neural network) for pattern recognition / classification problems with categorical outcomes. However, from the practical standpoint, it is often not necessary to draw a fine line between GRNN

Read more »

Prototyping A General Regression Neural Network with SAS

June 22, 2013
By
Prototyping A General Regression Neural Network with SAS

Last time when I read the paper “A General Regression Neural Network” by Donald Specht, it was exactly 10 years ago when I was in the graduate school. After reading again this week, I decided to code it out with SAS macros and make this excellent idea available for the SAS community. The prototype of

Read more »