Blog Archives

Grid Search for Free Parameters with Parallel Computing

June 1, 2013
By
Grid Search for Free Parameters with Parallel Computing

In my previous post (http://statcompute.wordpress.com/2013/05/25/test-drive-of-parallel-computing-with-r) on 05/25/2013, I’ve demonstrated the power of parallel computing with various R packages. However, in the real world, it is not straight-forward to utilize these powerful tools in our day-by-day computing tasks without carefully formulate the problem. In the example below, I am going to show how to use the

Read more »

Rmagic, A Handy Interface Bridging Python and R

May 31, 2013
By
Rmagic, A Handy Interface Bridging Python and R

Rmagic (http://ipython.org/ipython-doc/dev/config/extensions/rmagic.html) is the ipython extension that utilizes rpy2 in the back-end and provides a convenient interface accessing R from ipython. Compared with the generic use of rpy2, the rmagic extension allows users to exchange objects between ipython and R in a more flexible way and to run a single R function or a block

Read more »

Import All Text Files in A Folder with Parallel Execution

May 26, 2013
By
Import All Text Files in A Folder with Parallel Execution

Sometimes, we might need to import all files, e.g. *.txt, with the same data layout in a folder without knowing each file name and then combine all pieces together. With the old method, we can use lapply() and do.call() functions to accomplish the task. However, when there are a large number of such files and

Read more »

Test Drive of Parallel Computing with R

May 25, 2013
By
Test Drive of Parallel Computing with R

Today, I did a test run of parallel computing with snow and multicore packages in R and compared the parallelism with the single-thread lapply() function. In the test code below, a data.frame with 20M rows is simulated in a Ubuntu VM with 8-core CPU and 10-G memory. As the baseline, lapply() function is employed to

Read more »

Conversion between Factor and Dummies in R

May 18, 2013
By
Conversion between Factor and Dummies in R

Read more »

A Prototype of Monotonic Binning Algorithm with R

May 4, 2013
By
A Prototype of Monotonic Binning Algorithm with R

I’ve been asked many time if I have a piece of R code implementing the monotonic binning algorithm, similar to the one that I developed with SAS (http://statcompute.wordpress.com/2012/06/10/a-sas-macro-implementing-monotonic-woe-transformation-in-scorecard-development) and with Python (http://statcompute.wordpress.com/2012/12/08/monotonic-binning-with-python). Today, I finally had time to draft a quick prototype with 20 lines of R code, which is however barely useable without the

Read more »

Disaggregating Annual Losses into Each Quarter

April 23, 2013
By
Disaggregating Annual Losses into Each Quarter

In loss forecasting, it is often necessary to disaggregate annual losses into each quarter. The most simple method to convert low frequency to high frequency time series is interpolation, such as the one implemented in EXPAND procedure of SAS/ETS. In the example below, there is a series of annual loss projections from 2013 through 2016.

Read more »

A Grid Search for The Optimal Setting in Feed-Forward Neural Networks

February 3, 2013
By
A Grid Search for The Optimal Setting in Feed-Forward Neural Networks

The feed-forward neural network is a very powerful classification model in the machine learning content. Since the goodness-of-fit of a neural network is majorly dominated by the model complexity, it is very tempting for a modeler to over-parameterize the neural network by using too many hidden layers or/and hidden units. As pointed out by Brian

Read more »

Another Benchmark for Joining Two Data Frames

January 29, 2013
By
Another Benchmark for Joining Two Data Frames

In my post yesterday comparing efficiency in joining two data frames, I overlooked the computing cost used to convert data.frames to data.tables / ff data objects. Today, I did the test again with the consideration of library loading and data conversion. After the replication of 10 times in rbenchmark package, the joining method with data.table

Read more »

Efficiency in Joining Two Data Frames

January 28, 2013
By
Efficiency in Joining Two Data Frames

In R, there are multiple ways to merge 2 data frames. However, there could be a huge disparity in terms of efficiency. Therefore, it is worthwhile to test the performance among different methods and choose the correct approach in the real-world work. For smaller data frames with 1,000 rows, all six methods shown below seem

Read more »