Articles by statcompute

Why Use Weight of Evidence?

May 4, 2019 | statcompute

I had been asked why I spent so much effort on developing SAS macros and R functions to do monotonic binning for the WoE transformation, given the availability of other cutting-edge data mining algorithms that will automatically generate the prediction with whatever predictors fed in the model. Nonetheless, what really ...
[Read more...]

More General Weighted Binning

April 27, 2019 | statcompute

You might be wondering what motivates me spending countless weekend hours on the MOB package. The answer is plain and simple. It is users that are driving the development work. After I published the wts_bin() function last week showing the impact of two-value weights on the monotonic binning outcome (... [Read more...]

Binning with Weights

April 21, 2019 | statcompute

After working on the MOB package, I received requests from multiple users if I can write a binning function that takes the weighting scheme into consideration. It is a legitimate request from the practical standpoint. For instance, in the development of fraud detection models, we often would sample down non-fraud ... [Read more...]

Batch Deployment of WoE Transformations

April 20, 2019 | statcompute

After wrapping up the function batch_woe() today with the purpose to allow users to apply WoE transformations to many independent variables simultaneously, I have completed the development of major functions in the MOB package that can be usable for the model development in a production setting. The function batch_... [Read more...]

Batch Processing of Monotonic Binning

April 13, 2019 | statcompute

In my GitHub repository (, multiple R functions have been developed to implement the monotonic binning by using either iterative discretization or isotonic regression. With these functions, we can run the monotonic binning for one independent variable at a time. However, in a real-world production environment, ... [Read more...]

Monotonic Binning with GBM

March 31, 2019 | statcompute

In addition to monotonic binning algorithms introduced in my previous post (, two more functions based on Generalized Boosted Regression Models have been added to my GitHub repository, gbm_bin() and gbmcv_bin(). The function gbm_bin() estimates a GBM model without the cross validation and ... [Read more...]

Deployment of Binning Outcomes in Production

March 26, 2019 | statcompute

In my previous post (, I’ve shown different monotonic binning algorithm that I developed over time. However, these binning functions are all useless without a deployment vehicle in production. During the weekend, I finally had time to draft a R function ( [Read more...]

Bayesian Optimization for Hyper-Parameter

February 24, 2019 | statcompute

In past several weeks, I spent a tremendous amount of time on reading literature about automatic parameter tuning in the context of Machine Learning (ML), most of which can be classified into two major categories, e.g. search and optimization. Searching mechanisms, such as grid search, random search, and Sobol ... [Read more...]

Gradient-Free Optimization for GLMNET Parameters

February 23, 2019 | statcompute

In the post, it was shown how to optimize hyper-parameters, namely alpha and gamma, of the glmnet by using the built-in cv.glmnet() function. However, following a similar logic of hyper-parameter optimization shown in the post, we can directly optimize ... [Read more...]

Direct Optimization of Hyper-Parameter

February 10, 2019 | statcompute

In the previous post (, it is shown how to identify the optimal hyper-parameter in a General Regression Neural Network by using the Sobol sequence and the uniform random generator respectively through the N-fold cross validation. While the Sobol sequence yields a slightly better performance, outcomes ... [Read more...]

Sobol Sequence vs. Uniform Random in Hyper-Parameter Optimization

February 3, 2019 | statcompute

Tuning hyper-parameters might be the most tedious yet crucial in various machine learning algorithms, such as neural networks, svm, or boosting. The configuration of hyper-parameters not only impacts the computational efficiency of a learning algorithm but also determines its prediction accuracy. Thus far, manual tuning and grid searching are still ...
[Read more...]

Co-integration and Mean Reverting Portfolio

January 5, 2019 | statcompute

In the previous post, it was shown how to identify two co-integrated stocks in the pair trade. In the example below, I will show how to form a mean reverting portfolio with three or more stocks, e.g. stocks with co-integration, and also how to ... [Read more...]

Statistical Assessments of AUC

December 25, 2018 | statcompute

In the scorecard development, the area under ROC curve, also known as AUC, has been widely used to measure the performance of a risk scorecard. Given everything else equal, the scorecard with a higher AUC is considered more predictive than the one with a lower AUC. However, little attention has ... [Read more...]

Phillips-Ouliaris Test For Cointegration

December 16, 2018 | statcompute

In a project of developing PPNR balance projection models, I tried to use the Phillips-Ouliaris (PO) test to investigate the cointegration between the historical balance and a set of macro-economic variables and noticed that implementation routines of PO test in various R packages, e.g. urca and tseries, would give ... [Read more...]

An Utility Function For Monotonic Binning

December 2, 2018 | statcompute

In all monotonic algorithms that I posted before, I heavily relied on the smbinning::smbinning.custom() function contributed by Herman Jopia as the utility function generating the binning output and therefore feel deeply indebted to his excellent work. However, the availability of smbinning::smbinning.custom() function shouldn’t become my ... [Read more...]

Improving Binning by Bootstrap Bumping

November 25, 2018 | statcompute

In the post (, a more robust version of monotonic binning based on the isotonic regression was introduced. Nonetheless, due to the loss of granularity, the predictability has been somewhat compromised, which is a typical dilemma in the data science. On one hand, we don’t ... [Read more...]

More Robust Monotonic Binning Based on Isotonic Regression

November 23, 2018 | statcompute

Since publishing the monotonic binning function based upon the isotonic regression (, I’ve received some feedback from peers. A potential concern is that, albeit improving the granularity and predictability, the binning is too fine and might not generalize well in the new data. In light ...
[Read more...]

Creating List with Iterator

November 22, 2018 | statcompute

In the post (, it is shown how to grow a list or a list-like queue based upon a dataframe. In the example, the code snippet was heavily relied on the FOR loop to do the assignment item by item, which I can’t help thinking ... [Read more...]
1 2 3 4 8

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)