Automated Dashboard for Credit Modelling with Decision trees and Random forests in R

January 15, 2019
By

(This article was first published on R Programming – DataScience+, and kindly contributed to R-bloggers)

    Categories

    1. Programming

    Tags

    1. Data Visualisation
    2. Flexdashboard
    3. R Programming
    4. RMarkdown

    In this article, you learn how to make Automated Dashboard for Credit Modelling with Decision trees and Random forests in R. First you need to install the `rmarkdown` package into your R library. Assuming that you installed the `rmarkdown`, next you create a new `rmarkdown` script in R.

    After this you type the following code in order to create a dashboard with rmarkdown and flexdashboard:

    ---
    title: "Automated Dashboard for Credit Modelling with Decision trees and Random forests in R"
    author: "Kristian Larsen"
    output: 
      flexdashboard::flex_dashboard:
        orientation: rows
        vertical_layout: scroll
    ---
    
    ```{r setup, include=FALSE}
    # Data management packages
    library(flexdashboard)
    library(dplyr)
    library(caret)
    library(partykit)
    library(randomForest)
    library(Hmisc)
    knitr::opts_chunk$set(cache=TRUE)
    options(scipen = 9999)
    rm(list=ls())
    
    # Read dataset
    loans <- read.csv("http://www.sci.csueastbay.edu/~esuess/classes/Statistics_6620/Presentations/ml7/credit.csv")
    str(loans)
    
    # Data management
    # Change the order/level of checking_balance variable
    loans$checking_balance <- factor(loans$checking_balance,
                                    levels = c(" 200 DM",
                                               "unknown"))
    summary(loans[loans$default == "yes", "checking_balance"])
    # Change the order/level of saving_balance variable
    loans$savings_balance <- factor(loans$savings_balance,
                                    levels = c(" 1000 DM",
                                               "unknown"))
    summary(loans[loans$default == "yes", "savings_balance"])
    # Change the order/level of credit_history variable
    loans$credit_history <- factor(loans$credit_history,
                                    levels = c("critical",
                                               "poor",
                                               "good",
                                               "very good",
                                               "perfect"))
    summary(loans[loans$default == "yes", "credit_history"])
    # Change the order/level of other_credit variable
    loans$other_credit <- factor(loans$other_credit,
                                    levels = c("none",
                                               "store",
                                               "bank"))
    summary(loans[loans$default == "yes", "other_credit"])
    set.seed(300)
    in_loans_train <- sample(nrow(loans), nrow(loans)*0.75)
    loans_train <- loans[in_loans_train, ]
    loans_test <- loans[-in_loans_train, ]
    ```
    
    Row {data-width=350}
    -----------------------------------------------------------------------
    
    ### Chart A - Decision tree Model I
    ```{r}
    loans_model_dt <- ctree(default ~ ., loans_train)
    plot(loans_model_dt)
    ```
    
    
    ### Chart B - Decision tree Model I - simple
    ```{r}
    plot(loans_model_dt, type = "simple")
    
    ```
    
    Row {data-width=650}
    -----------------------------------------------------------------------
    
    ### Chart C - Decision tree Model Model I - formula
    ```{r}
    loans_model_dt
    ```
    
    Row {data-width=650}
    -----------------------------------------------------------------------
    
    ### Chart D - Confusion Matrix for Decision tree Model Model I
    ```{r}
    loans_pred_dt <- predict(loans_model_dt, loans_test)
    dt_conft <- table("prediction" = loans_pred_dt,
                       "actual" = loans_test$default
                       )
    accu_dt <- round((dt_conft[1]+dt_conft[4])/sum(dt_conft[1:4]),4)
    prec_dt <- round(dt_conft[4]/(dt_conft[2]+dt_conft[4]), 4)
    reca_dt <- round(dt_conft[4]/(dt_conft[4]+dt_conft[3]), 4)
    spec_dt <- round(dt_conft[1]/(dt_conft[1]+dt_conft[2]), 4)
    confusionMatrix(loans_pred_dt, loans_test$default, positive = "yes")
    ```
    
    Row {data-width=650}
    -----------------------------------------------------------------------
    
    ### Chart E - Decision tree Model II 
    ```{r}
    loans_model_dt2 <- ctree(default ~ ., loans_train, control = ctree_control(mincriterion = 0.7))
    plot(loans_model_dt2)
    ```
    
    Row {data-width=650}
    -----------------------------------------------------------------------
    
    ### Chart F - Decision tree Model Model II - formula
    ```{r}
    loans_model_dt2
    ```
    
    Row {data-width=650}
    -----------------------------------------------------------------------
    
    ### Chart G - Confusion Matrix for Decision tree Model Model II
    ```{r}
    loans_pred_dt2 <- predict(loans_model_dt2, loans_test)
    confusionMatrix(loans_pred_dt2, loans_test$default, positive = "yes")
    ```
    
    Row {data-width=650}
    -----------------------------------------------------------------------
    
    ### Chart H - Random Forest Model
    ```{r}
    set.seed(300)
    ctrl <- trainControl(method = "repeatedcv", number = 10, repeats = 3, allowParallel = TRUE)
    loans_rf <- train(default ~ ., data = loans, method = "rf", trControl = ctrl)
    loans_rf$finalModel
    ```
    
    Row {data-width=650}
    -----------------------------------------------------------------------
    
    ### Chart I - Random Forest Model - variable importance
    ```{r}
    varImp(loans_rf)
    ```
    
    Row {data-width=350}
    -----------------------------------------------------------------------
    
    ### Chart J - Random Forest Model - Final model plot I
    ```{r}
    plot(loans_rf$finalModel)
    legend("topright", colnames(loans_rf$finalModel$err.rate),col = 1:6, cex = 0.8, fill = 1:6)
    ```
    
    ### Chart K - Random Forest Model - Final model plot II
    ```{r}
    plot(loans_rf)
    ```
    

    Screenshot:

    The result of the above coding are published with RPubs here.

    References

    1. Using flexdashboard in R

    Related Post

    1. Automated Dashboard for Classification Neural Network in R
    2. How to Achieve Parallel Processing in Python Programming
    3. Recommender System for Christmas in Python
    4. Automated Dashboard visualizations with Time series visualizations in R
    5. Automated Dashboard visualizations with distribution in R

    To leave a comment for the author, please follow the link and comment on their blog: R Programming – DataScience+.

    R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



    If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

    Comments are closed.

    Search R-bloggers

    Sponsors

    Never miss an update!
    Subscribe to R-bloggers to receive
    e-mails with the latest R posts.
    (You will not see this message again.)

    Click here to close (This popup will not appear again)