Blog Archives

Faster Way to Slice Dataframe by Row

May 12, 2019
By

When we’d like to slice a dataframe by row, we can employ the split() function or the iter() function in the iterators package. By leveraging the power of parallelism, I wrote an utility function slice() to faster slice the dataframe. In the example shown below, the slice() is 3 times more efficient than the split()

Read more »

Granular Weighted Binning by Generalized Boosted Model

May 7, 2019
By
Granular Weighted Binning by Generalized Boosted Model

In the post https://statcompute.wordpress.com/2019/04/27/more-general-weighted-binning, I’ve shown how to do the weighted binning with the function wqtl_bin() by the iterative partitioning. However, the outcome from wqtl_bin() sometimes can be too coarse. The function wgbm_bin() (https://github.com/statcompute/MonotonicBinning/blob/master/code/wgbm_bin.R) leverages the idea of gbm() that implements the Generalized Boosted Model and generates more granular weighted binning outcomes. Below is the

Read more »

Why Use Weight of Evidence?

May 4, 2019
By
Why Use Weight of Evidence?

I had been asked why I spent so much effort on developing SAS macros and R functions to do monotonic binning for the WoE transformation, given the availability of other cutting-edge data mining algorithms that will automatically generate the prediction with whatever predictors fed in the model. Nonetheless, what really distinguishes a good modeler from

Read more »

More General Weighted Binning

April 27, 2019
By

You might be wondering what motivates me spending countless weekend hours on the MOB package. The answer is plain and simple. It is users that are driving the development work. After I published the wts_bin() function last week showing the impact of two-value weights on the monotonic binning outcome (https://statcompute.wordpress.com/2019/04/21/binning-with-weights), a question was asked if

Read more »

Binning with Weights

April 21, 2019
By

After working on the MOB package, I received requests from multiple users if I can write a binning function that takes the weighting scheme into consideration. It is a legitimate request from the practical standpoint. For instance, in the development of fraud detection models, we often would sample down non-fraud cases given an extremely low

Read more »

Batch Deployment of WoE Transformations

April 20, 2019
By

After wrapping up the function batch_woe() today with the purpose to allow users to apply WoE transformations to many independent variables simultaneously, I have completed the development of major functions in the MOB package that can be usable for the model development in a production setting. The function batch_woe() basically is the wrapper around cal_woe()

Read more »

Batch Processing of Monotonic Binning

April 13, 2019
By

In my GitHub repository (https://github.com/statcompute/MonotonicBinning), multiple R functions have been developed to implement the monotonic binning by using either iterative discretization or isotonic regression. With these functions, we can run the monotonic binning for one independent variable at a time. However, in a real-world production environment, we often would want to apply the binning algorithm

Read more »

Monotonic Binning with GBM

March 31, 2019
By

In addition to monotonic binning algorithms introduced in my previous post (https://statcompute.wordpress.com/2019/03/10/a-summary-of-my-home-brew-binning-algorithms-for-scorecard-development), two more functions based on Generalized Boosted Regression Models have been added to my GitHub repository, gbm_bin() and gbmcv_bin(). The function gbm_bin() estimates a GBM model without the cross validation and tends to generate a more granular binning outcome. The function gbmcv_bin() estimates

Read more »

Deployment of Binning Outcomes in Production

March 26, 2019
By

In my previous post (https://statcompute.wordpress.com/2019/03/10/a-summary-of-my-home-brew-binning-algorithms-for-scorecard-development), I’ve shown different monotonic binning algorithm that I developed over time. However, these binning functions are all useless without a deployment vehicle in production. During the weekend, I finally had time to draft a R function (https://github.com/statcompute/MonotonicBinning/blob/master/code/calc_woe.R) that can be used to deploy the binning outcome and to apply the

Read more »

A Summary of My Home-Brew Binning Algorithms for Scorecard Development

March 10, 2019
By

Thus far, I have published four different monotonic binning algorithms for the scorecard development and think that it might be a right timing to do a quick summary. R functions for these binning algorithms are also available on https://github.com/statcompute/MonotonicBinning. The first one was posted back in 2017 (https://statcompute.wordpress.com/2017/01/22/monotonic-binning-with-smbinning-package) based on my SAS macro (https://statcompute.wordpress.com/2012/06/10/a-sas-macro-implementing-monotonic-woe-transformation-in-scorecard-development) that

Read more »

Search R-bloggers

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)