PART – A Rule-Learning Algorithm

January 11, 2013
By

(This article was first published on Yet Another Blog in Statistical Computing » S+/R, and kindly contributed to R-bloggers)

> require('RWeka')
> require('pROC')
> 
> # SEPARATE DATA INTO TRAINING AND TESTING SETS
> df1 <- read.csv('credit_count.csv')
> df2 <- df1[df1$CARDHLDR == 1, 2:12]
> set.seed(2013)
> rows <- sample(1:nrow(df2), nrow(df2) - 1000)
> set1 <- df2[rows, ]
> set2 <- df2[-rows, ]
> 
> # BUILD A PART RULE MODEL
> mdl1 <- PART(factor(BAD) ~., data = set1)
> print(mdl1)
PART decision list
------------------

EXP_INC > 0.000774 AND
AGE > 21.833334 AND
INCOME > 2100 AND
MAJORDRG <= 0 AND
OWNRENT > 0 AND
MINORDRG <= 1: 0 (2564.0/103.0)

AGE > 21.25 AND
EXP_INC > 0.000774 AND
INCPER > 17010 AND
INCOME > 1774.583333 AND
MINORDRG <= 0: 0 (2278.0/129.0)

AGE > 20.75 AND
EXP_INC > 0.016071 AND
OWNRENT > 0 AND
SELFEMPL > 0 AND
EXP_INC <= 0.233759 AND
MINORDRG <= 1: 0 (56.0)

AGE > 20.75 AND
EXP_INC > 0.016071 AND
SELFEMPL <= 0 AND
OWNRENT > 0: 0 (1123.0/130.0)

OWNRENT <= 0 AND
AGE > 20.75 AND
ACADMOS <= 20 AND
ADEPCNT <= 2 AND
MINORDRG > 0 AND
ACADMOS <= 14: 0 (175.0/10.0)

OWNRENT <= 0 AND
AGE > 20.75 AND
ADEPCNT <= 0: 0 (1323.0/164.0)

INCOME > 1423 AND
OWNRENT <= 0 AND
MINORDRG <= 1 AND
ADEPCNT > 0 AND
SELFEMPL <= 0 AND
MINORDRG <= 0: 0 (943.0/124.0)

SELFEMPL > 0 AND
MAJORDRG <= 0 AND
ACADMOS > 85: 0 (24.0)

SELFEMPL > 0 AND
MAJORDRG <= 1 AND
MAJORDRG <= 0 AND
MINORDRG <= 0 AND
INCOME > 2708.333333: 0 (17.0)

SELFEMPL > 0 AND
MAJORDRG <= 1 AND
OWNRENT <= 0 AND
MINORDRG <= 0 AND
INCPER <= 8400: 0 (13.0)

SELFEMPL <= 0 AND
OWNRENT > 0 AND
ADEPCNT <= 0 AND
MINORDRG <= 0 AND
MAJORDRG <= 0: 0 (107.0/15.0)

OWNRENT <= 0 AND
MINORDRG > 0 AND
MINORDRG <= 1 AND
MAJORDRG <= 1 AND
MAJORDRG <= 0 AND
SELFEMPL <= 0: 0 (87.0/13.0)

OWNRENT <= 0 AND
SELFEMPL <= 0 AND
MAJORDRG <= 0 AND
MINORDRG <= 1: 0 (373.0/100.0)

MAJORDRG > 0 AND
MINORDRG > 0 AND
MAJORDRG <= 1 AND
MINORDRG <= 1: 0 (29.0)

SELFEMPL <= 0 AND
OWNRENT > 0 AND
MAJORDRG <= 0: 0 (199.0/57.0)

OWNRENT <= 0 AND
SELFEMPL <= 0: 0 (84.0/24.0)

MAJORDRG > 1: 0 (17.0/3.0)

ACADMOS <= 34 AND
MAJORDRG > 0: 0 (10.0)

MAJORDRG <= 0 AND
ADEPCNT <= 2 AND
OWNRENT <= 0: 0 (29.0/7.0)

OWNRENT > 0 AND
SELFEMPL > 0 AND
EXP_INC <= 0.218654 AND
MINORDRG <= 2 AND
MINORDRG <= 1: 0 (8.0/1.0)

OWNRENT > 0 AND
INCOME <= 2041.666667 AND
MAJORDRG > 0 AND
ADEPCNT > 0: 1 (5.0)

OWNRENT > 0 AND
AGE > 33.416668 AND
ACADMOS <= 174 AND
SELFEMPL > 0: 0 (10.0/1.0)

OWNRENT > 0 AND
SELFEMPL <= 0 AND
MINORDRG <= 1 AND
AGE > 33.5 AND
EXP_INC > 0.006737: 0 (6.0)

EXP_INC > 0.001179: 1 (16.0/1.0)

: 0 (3.0)

Number of Rules  : 	25

> pred1 <- data.frame(prob = predict(mdl1, newdata = set2, type = 'probability')[, 2]) 
> # ROC FOR TESTING SET
> print(roc1 <- roc(set2$BAD, pred1$prob))

Call:
roc.default(response = set2$BAD, predictor = pred1$prob)

Data: pred1$prob in 905 controls (set2$BAD 0) < 95 cases (set2$BAD 1).
Area under the curve: 0.6794
> 
> # BUILD A LOGISTIC REGRESSION
> mdl2 <- Logistic(factor(BAD) ~., data = set1)
> print(mdl2)
Logistic Regression with ridge parameter of 1.0E-8
Coefficients...
               Class
Variable           0
====================
AGE           0.0112
ACADMOS      -0.0005
ADEPCNT      -0.0747
MAJORDRG     -0.2312
MINORDRG     -0.1991
OWNRENT       0.2244
INCOME        0.0004
SELFEMPL     -0.1206
INCPER             0
EXP_INC       0.4472
Intercept     0.7965


Odds Ratios...
               Class
Variable           0
====================
AGE           1.0113
ACADMOS       0.9995
ADEPCNT        0.928
MAJORDRG      0.7936
MINORDRG      0.8195
OWNRENT       1.2516
INCOME        1.0004
SELFEMPL      0.8864
INCPER             1
EXP_INC       1.5639

> pred2 <- data.frame(prob = predict(mdl2, newdata = set2, type = 'probability')[, 2])  
> # ROC FOR TESTING SET
> print(roc2 <- roc(set2$BAD, pred2$prob))

Call:
roc.default(response = set2$BAD, predictor = pred2$prob)

Data: pred2$prob in 905 controls (set2$BAD 0) < 95 cases (set2$BAD 1).
Area under the curve: 0.6529
> 
> # COMPARE TWO ROCS
> roc.test(roc1, roc2)

	DeLong's test for two correlated ROC curves

data:  roc1 and roc2 
Z = 1.0344, p-value = 0.301
alternative hypothesis: true difference in AUC is not equal to 0 
sample estimates:
AUC of roc1 AUC of roc2 
  0.6793894   0.6528875 

To leave a comment for the author, please follow the link and comment on their blog: Yet Another Blog in Statistical Computing » S+/R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)