PART – A Rule-Learning Algorithm
[This article was first published on Yet Another Blog in Statistical Computing » S+/R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
> require('RWeka')
> require('pROC')
>
> # SEPARATE DATA INTO TRAINING AND TESTING SETS
> df1 <- read.csv('credit_count.csv')
> df2 <- df1[df1$CARDHLDR == 1, 2:12]
> set.seed(2013)
> rows <- sample(1:nrow(df2), nrow(df2) - 1000)
> set1 <- df2[rows, ]
> set2 <- df2[-rows, ]
>
> # BUILD A PART RULE MODEL
> mdl1 <- PART(factor(BAD) ~., data = set1)
> print(mdl1)
PART decision list
------------------
EXP_INC > 0.000774 AND
AGE > 21.833334 AND
INCOME > 2100 AND
MAJORDRG <= 0 AND
OWNRENT > 0 AND
MINORDRG <= 1: 0 (2564.0/103.0)
AGE > 21.25 AND
EXP_INC > 0.000774 AND
INCPER > 17010 AND
INCOME > 1774.583333 AND
MINORDRG <= 0: 0 (2278.0/129.0)
AGE > 20.75 AND
EXP_INC > 0.016071 AND
OWNRENT > 0 AND
SELFEMPL > 0 AND
EXP_INC <= 0.233759 AND
MINORDRG <= 1: 0 (56.0)
AGE > 20.75 AND
EXP_INC > 0.016071 AND
SELFEMPL <= 0 AND
OWNRENT > 0: 0 (1123.0/130.0)
OWNRENT <= 0 AND
AGE > 20.75 AND
ACADMOS <= 20 AND
ADEPCNT <= 2 AND
MINORDRG > 0 AND
ACADMOS <= 14: 0 (175.0/10.0)
OWNRENT <= 0 AND
AGE > 20.75 AND
ADEPCNT <= 0: 0 (1323.0/164.0)
INCOME > 1423 AND
OWNRENT <= 0 AND
MINORDRG <= 1 AND
ADEPCNT > 0 AND
SELFEMPL <= 0 AND
MINORDRG <= 0: 0 (943.0/124.0)
SELFEMPL > 0 AND
MAJORDRG <= 0 AND
ACADMOS > 85: 0 (24.0)
SELFEMPL > 0 AND
MAJORDRG <= 1 AND
MAJORDRG <= 0 AND
MINORDRG <= 0 AND
INCOME > 2708.333333: 0 (17.0)
SELFEMPL > 0 AND
MAJORDRG <= 1 AND
OWNRENT <= 0 AND
MINORDRG <= 0 AND
INCPER <= 8400: 0 (13.0)
SELFEMPL <= 0 AND
OWNRENT > 0 AND
ADEPCNT <= 0 AND
MINORDRG <= 0 AND
MAJORDRG <= 0: 0 (107.0/15.0)
OWNRENT <= 0 AND
MINORDRG > 0 AND
MINORDRG <= 1 AND
MAJORDRG <= 1 AND
MAJORDRG <= 0 AND
SELFEMPL <= 0: 0 (87.0/13.0)
OWNRENT <= 0 AND
SELFEMPL <= 0 AND
MAJORDRG <= 0 AND
MINORDRG <= 1: 0 (373.0/100.0)
MAJORDRG > 0 AND
MINORDRG > 0 AND
MAJORDRG <= 1 AND
MINORDRG <= 1: 0 (29.0)
SELFEMPL <= 0 AND
OWNRENT > 0 AND
MAJORDRG <= 0: 0 (199.0/57.0)
OWNRENT <= 0 AND
SELFEMPL <= 0: 0 (84.0/24.0)
MAJORDRG > 1: 0 (17.0/3.0)
ACADMOS <= 34 AND
MAJORDRG > 0: 0 (10.0)
MAJORDRG <= 0 AND
ADEPCNT <= 2 AND
OWNRENT <= 0: 0 (29.0/7.0)
OWNRENT > 0 AND
SELFEMPL > 0 AND
EXP_INC <= 0.218654 AND
MINORDRG <= 2 AND
MINORDRG <= 1: 0 (8.0/1.0)
OWNRENT > 0 AND
INCOME <= 2041.666667 AND
MAJORDRG > 0 AND
ADEPCNT > 0: 1 (5.0)
OWNRENT > 0 AND
AGE > 33.416668 AND
ACADMOS <= 174 AND
SELFEMPL > 0: 0 (10.0/1.0)
OWNRENT > 0 AND
SELFEMPL <= 0 AND
MINORDRG <= 1 AND
AGE > 33.5 AND
EXP_INC > 0.006737: 0 (6.0)
EXP_INC > 0.001179: 1 (16.0/1.0)
: 0 (3.0)
Number of Rules : 25
> pred1 <- data.frame(prob = predict(mdl1, newdata = set2, type = 'probability')[, 2])
> # ROC FOR TESTING SET
> print(roc1 <- roc(set2$BAD, pred1$prob))
Call:
roc.default(response = set2$BAD, predictor = pred1$prob)
Data: pred1$prob in 905 controls (set2$BAD 0) < 95 cases (set2$BAD 1).
Area under the curve: 0.6794
>
> # BUILD A LOGISTIC REGRESSION
> mdl2 <- Logistic(factor(BAD) ~., data = set1)
> print(mdl2)
Logistic Regression with ridge parameter of 1.0E-8
Coefficients...
Class
Variable 0
====================
AGE 0.0112
ACADMOS -0.0005
ADEPCNT -0.0747
MAJORDRG -0.2312
MINORDRG -0.1991
OWNRENT 0.2244
INCOME 0.0004
SELFEMPL -0.1206
INCPER 0
EXP_INC 0.4472
Intercept 0.7965
Odds Ratios...
Class
Variable 0
====================
AGE 1.0113
ACADMOS 0.9995
ADEPCNT 0.928
MAJORDRG 0.7936
MINORDRG 0.8195
OWNRENT 1.2516
INCOME 1.0004
SELFEMPL 0.8864
INCPER 1
EXP_INC 1.5639
> pred2 <- data.frame(prob = predict(mdl2, newdata = set2, type = 'probability')[, 2])
> # ROC FOR TESTING SET
> print(roc2 <- roc(set2$BAD, pred2$prob))
Call:
roc.default(response = set2$BAD, predictor = pred2$prob)
Data: pred2$prob in 905 controls (set2$BAD 0) < 95 cases (set2$BAD 1).
Area under the curve: 0.6529
>
> # COMPARE TWO ROCS
> roc.test(roc1, roc2)
DeLong's test for two correlated ROC curves
data: roc1 and roc2
Z = 1.0344, p-value = 0.301
alternative hypothesis: true difference in AUC is not equal to 0
sample estimates:
AUC of roc1 AUC of roc2
0.6793894 0.6528875
To leave a comment for the author, please follow the link and comment on their blog: Yet Another Blog in Statistical Computing » S+/R.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.