# A Prototype of Monotonic Binning Algorithm with R

May 4, 2013
By

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I’ve been asked many time if I have a piece of R code implementing the monotonic binning algorithm, similar to the one that I developed with SAS (http://statcompute.wordpress.com/2012/06/10/a-sas-macro-implementing-monotonic-woe-transformation-in-scorecard-development) and with Python (http://statcompute.wordpress.com/2012/12/08/monotonic-binning-with-python). Today, I finally had time to draft a quick prototype with 20 lines of R code, which is however barely useable without the further polish. But it is still a little surprising to me how efficient it can be to use R in algorithm prototyping, much sleeker than SAS macro.

```library(sas7bdat)
library(Hmisc)

bin <- function(x, y){
n <- min(50, length(unique(x)))
repeat {
n   <- n - 1
d1  <- data.frame(x, y, bin = cut2(x, g = n))
d2  <- aggregate(d1[-3], d1[3], mean)
cor <- cor(d2[-1], method = "spearman")
if(abs(cor[1, 2]) == 1) break
}
d2[2] <- NULL
colnames(d2) <- c('LEVEL', 'RATE')
head <- paste(toupper(substitute(y)), " RATE by ", toupper(substitute(x)), sep = '')
cat("+-", rep("-", nchar(head)), "-+\n", sep = '')
cat("| ", head, ' |\n', sep = '')
cat("+-", rep("-", nchar(head)), "-+\n", sep = '')
print(d2)
cat("\n")
}

attach(data)

```

R output:

```+--------------------------+
| BAD RATE by BUREAU_SCORE |
+--------------------------+
LEVEL       RATE
1  [443,618) 0.44639376
2  [618,643) 0.38446602
3  [643,658) 0.31835938
4  [658,673) 0.23819302
5  [673,686) 0.19838057
6  [686,702) 0.17850288
7  [702,715) 0.14168378
8  [715,731) 0.09815951
9  [731,752) 0.07212476
10 [752,776) 0.05487805
11 [776,848] 0.02605210

+---------------------------+
| BAD RATE by AGE_OLDEST_TR |
+---------------------------+
LEVEL       RATE
1  [  1, 34) 0.33333333
2  [ 34, 62) 0.30560928
3  [ 62, 87) 0.25145068
4  [ 87,113) 0.23346304
5  [113,130) 0.21616162
6  [130,149) 0.20036101
7  [149,168) 0.19361702
8  [168,198) 0.15530303
9  [198,245) 0.11111111
10 [245,308) 0.10700389
11 [308,588] 0.08730159

+------------------------+
| BAD RATE by TOT_INCOME |
+------------------------+
LEVEL      RATE
1 [   0,   2570) 0.2498715
2 [2570,   4510) 0.2034068
3 [4510,8147167] 0.1602327

+--------------------+
| BAD RATE by TOT_TR |
+--------------------+
LEVEL      RATE
1 [ 0,12) 0.2672370
2 [12,22) 0.1827676
3 [22,77] 0.1422764
```

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.