Bayesian Model Based Optimization in R

[This article was first published on R – Data Science, Insurance, Bikes, and the Meaning of Life, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Model-based optimization (MBO) is a smart approach to tuning the hyperparameters of machine learning algorithms with less CPU time and manual effort than standard grid search approaches. The core idea behind MBO is to directly evaluate fewer points within a hyperparameter space, and to instead use a “surrogate model” which estimates what the result of your objective function would be in new locations by interpolating (not linearly) between the observed results in a small sample of initial evaluations. Many methods can be used to construct the surrogate model. This post will focus on implementing the bayesian method of Gaussian Process (GP) smoothing (aka “kriging”) which is borrowed from – and particularly well-suited to – spatial applications.

Background

I remember when I started using machine learning methods how time consuming and – even worse – manual it could be to perform a hyperparameter search. The whole benefit of machine learning is that the algorithm should optimize the model-learning task for us, right? The problem of course becomes one of compute resources. Suppose we only have simple brute-force grid search at our disposal. With just one hyperparameter to tune, this approach is practical – we may only need to test 5 candidate values. But as the number of hyperparameters (“dimension”) increments, the number of candidate hyperparameterizations increases according to a power function. Suppose instead we have 5 hyperparameters to tune – again using just points five points for each dimension would now result in \(5^5 = 3,125\) model evaluations to test all the possible combinations. Sometimes 5 points is realistic – for example, with some discrete parameters like the maximum tree depth in a random forest. But for something continuous it is usually not, so I am really understating how quickly a grid will blow up, making brute-force approaches impractical.

Pretty quickly one goes from the brute-force appraoch to more involved strategies for grid search. That could mean starting with coarse grids and zooming into specific promising areas with higher resolution grids in subsequent searches; it could mean iterating between two or three different subsets of the hyperparameters which tend to “move together” – like the learning rate and number of rounds in a GBM. These strategies become highly manual, and frankly it becomes a real effort to keep track of the different runs and results. We don't want to have to think this much and risk making a mistake when tuning algorithms!

Model-Based Optimization

MBO differs from grid search in a couple of ways. First, we search the entire continuous range of a hyperparameter, not a discretized set of points within that range. Second, and more importantly, it is a probabilistic method which uses information from early evaluations to improve the selection of subsequent tests that will be run. In this regard it is similar to the low-res/high-res search strategy, but with automation. As good Bayesians, we like methods that incorporate prior information to improve later decisions, a principle which is intuitive and appealing to our naturally bayesian brains.

As mentioned above, the method for selecting later test points based on the information from the early tests is gaussian process smoothing or kriging. One popular application for Gaussian processes is in geo-spatial smoothing and regression. We are basically doing the same thing here, except instead of geographic (lat-long) space, our space is defined by the ranges of a set of hyperparameters. We refer to this as the hyperparameter space, and MBO is going to help us search it for the point which provides the optimal result of a machine learning algorithm.

So let's take a look at how Bayes helps us tune machine learning algorithms with some code.

Demonstration

Environment

The main package we need is mlrMBO, which provides the mbo() method for optimizing an arbitrary function sequentially. We also need several others for various helpers – smoof to define the objective function which will be optimized; ParamHelpers to define a parameter space in which we will perform the bayesian search for a global optimum; and DiceKriging provides the gaussian process interpolation (in the machine learning world it is called “kriging”) capability.

We will use the xgboost flavor of GBM as our machine learning methodology to be tuned, but you could adapt what I'm demonstrating here to any algorithm with multiple hyperparameters (or even a single one, if run-time for a single iteration was so high as to warrant it). mlrMBO is completely agnostic to your choice of methodology, but the flip side is this means a bit of coding setup required on the data scientist's part (good thing we like coding, and don't like manual work).

library(CASdatasets)
library(dplyr)
library(tibble)
library(magrittr)
library(ggplot2)
library(scatterplot3d)
library(kableExtra)
library(tidyr)
library(mlrMBO)
library(ParamHelpers)
library(DiceKriging)
library(smoof)
library(xgboost)

Data

I'll use my go-to insurance ratemaking dataset for demonstration purposes – the french motor dataset from CASdatasets.

data("freMPL1")
data("freMPL2")
data("freMPL3")
fre_df <- rbind(freMPL1, freMPL2, freMPL3 %>% select(-DeducType))
rm(freMPL1, freMPL2, freMPL3)

Let's take a look at our target variable ClaimAmount

gridExtra::grid.arrange(
  fre_df %>%
    filter(ClaimAmount > 0) %>%
    ggplot(aes(x = ClaimAmount)) +
    geom_density() +
    ggtitle("Observed Loss Distribution"),

  fre_df %>%
    filter(ClaimAmount > 0, ClaimAmount < 1.5e4) %>%
    ggplot(aes(x = ClaimAmount)) +
    geom_density() +
    ggtitle("Observed Severity Distribution"),
  nrow = 1
)

plot of chunk plot_target

We have something like a compound distribution – a probability mass at 0, and some long-tailed distribution of loss dollars for observations with incurred claims. But let's also look beyond the smoothed graphical view.

min(fre_df$ClaimAmount)

## [1] -3407.7

sum(fre_df$ClaimAmount < 0)

## [1] 690

We also appear to have some claims < 0 – perhaps recoveries (vehicle salvage) exceeded payments. For the sake of focusing on the MBO, we will adjust these records by flooring values at 0. I'll also convert some factor columns to numeric types which make more sense for modeling.

fre_df %<>%
  mutate(ClaimAmount = case_when(ClaimAmount < 0 ~ 0, TRUE ~ ClaimAmount)) %>%
  mutate(VehMaxSpeed_num = sub(".*-", "", VehMaxSpeed) %>% substr(., 1, 3)%>% as.numeric,
         VehAge_num = sub("*.-", "", VehAge) %>% sub('\\+', '', .) %>% as.numeric,
         VehPrice_num = as.integer(VehPrice)) %>% # The factor levels appear to be ordered so I will use this
  group_by(SocioCateg) %>% # high cardinality, will encode as a proportion of total
  mutate(SocioCateg_prop =  (sum(n()) / 4) / nrow(.) * 1e5) %>% 
  ungroup()

## matrices, no intercept needed and don't forget to exclude post-dictors
fre_mat <- model.matrix(ClaimAmount ~ . -1 -ClaimInd -Exposure -RecordBeg 
                        -RecordEnd - VehMaxSpeed -VehPrice -VehAge -SocioCateg,
                        data = fre_df)
## xgb.DMatrix, faster sparse matrix
fre_dm <- xgb.DMatrix(data = fre_mat, 
                      label = fre_df$ClaimAmount, 
                      base_margin = log(fre_df$Exposure)) ## base-margin == offset
                                                          ## we use log earned exposure because the xgboost Tweedie
                                                          ## implementation includes a log-link for the variance power

Objective function for optimizing

To avoid confusion, there are two objective functions we could refer to. Statistically, our objective function aka our loss funciton is negative log-likelihood for am assumed tweeedie-distributed random variable. The xgboost algorithm will minimize this objective (equivalent to maximizing likelihood) for a given set of hyper-parameters for each run. Our other objective function is the R function defined below – it calls xgb.cv(), runs the learning procedure with cross-validation, stops when the out-of-fold likelihood does not improve, and returns the best objective evaluation (log-loss metric) based on the out-of-fold samples.

Note that the function below also includes a defined hyperparameter space – a set of tuning parameters with possible ranges for values. There are 6 traditional tuning parameters for xgboost, but I've also added the tweedie variance “power” parameter as a seventh. This parameter would take a value between (1,2) for a poisson-gamma compound distribution, but I first narrowed this down to a smaller range based on a quick profile of the loss distribution (using tweedie::tweedie.profile(), omitted here).

# Adapted for Tweedie likelihood from this very good post at https://www.simoncoulombe.com/2019/01/bayesian/
# objective function: we want to minimize the neg log-likelihood by tuning hyperparameters
obj.fun <- makeSingleObjectiveFunction(
  name = "xgb_cv_bayes",
  fn =   function(x){
    set.seed(42)
    cv <- xgb.cv(params = list(
      booster          = "gbtree",
      eta              = x["eta"],
      max_depth        = x["max_depth"],
      min_child_weight = x["min_child_weight"],
      gamma            = x["gamma"],
      subsample        = x["subsample"],
      colsample_bytree = x["colsample_bytree"],
      max_delta_step   = x["max_delta_step"],
      tweedie_variance_power = x["tweedie_variance_power"],
      objective        = 'reg:tweedie', 
      eval_metric     = paste0("tweedie-nloglik@", x["tweedie_variance_power"])),
      data = dm, ## must set in global.Env()
      nround = 7000, ## Set this large and use early stopping
      nthread = 26, ## Adjust based on your machine
      nfold =  5,
      prediction = FALSE,
      showsd = TRUE,
      early_stopping_rounds = 25, ## If evaluation metric does not improve on out-of-fold sample for 25 rounds, stop
      verbose = 1,
      print_every_n = 500)

    cv$evaluation_log %>% pull(4) %>% min  ## column 4 is the eval metric here, tweedie negative log-likelihood
  },
  par.set = makeParamSet(
    makeNumericParam("eta",                    lower = 0.005, upper = 0.01),
    makeNumericParam("gamma",                  lower = 1,     upper = 5),
    makeIntegerParam("max_depth",              lower= 2,      upper = 10),
    makeIntegerParam("min_child_weight",       lower= 300,    upper = 2000),
    makeNumericParam("subsample",              lower = 0.20,  upper = .8),
    makeNumericParam("colsample_bytree",       lower = 0.20,  upper = .8),
    makeNumericParam("max_delta_step",         lower = 0,     upper = 5),
    makeNumericParam("tweedie_variance_power", lower = 1.75,   upper = 1.85)
  ),
  minimize = TRUE ## negative log likelihood
)

A function which runs the optimization

The core piece here is the call to mbo(). This accepts an initial design – i.e. a set of locations which are chosen to be “space-filling” within our hyperparameter space (we do not want randomn generation which could result in areas of the space having no points nearby) – created using ParamHelpers::generateDesign(). The makeMBOControl() method is used to create an object which will simply tell mbo() how many optimization steps to run after the intial design is tested – these are the runs which are determined probabilistically through gaussian process smoothing, aka kriging. Finally, I create a plot of the optimization path and return the objects in a list for later use.

The covariance structure used in the gaussian process is what makes GPs “bayesian” – they define the prior information as a function of nearby observed values and the covariance structure which defines the level of smoothness expected. We use a Matern 3/2 kernel – this is a moderately smooth covariance often used in geospatial applications and which is well-suited to our own spatial task. It is equivalent to the product of an exponential and a polynomial of degree 1. This is the mbo default for a numerical hyperparameter space – if your hyperparameters include some which are non-numeric (for example, you may have a hyperparameter for “method” and a set of methods to choose from), then instead of kriging a random forest is used to estimate the value of the objective function between points, and from this the optimizing proposals are chosen. This would no longer be a strictly “bayesian” approach, though I think it would still be bayesian in spirit.

The gaussian process models the result of our objective function's output as a function of hyperparameter values, using the initial design samples. For this reason, it is referred to (especially in the deep learning community) as a surrogate model – it serves as a cheap surrogate for running another evaluation of our objective function at some new point. For any point not evaluated directly, the estimated/interpolated surface provides an expectation. This benefits us because points that are likely to perform poorly (based on the surrogate model estimate) will be discarded, and we will only move on with directly evaluating points in promising regions of the hyperparameter space.

Creating a wrapper function is optional – but to perform multiple runs in an analysis, most of the code here would need to be repeated. To be concise, I write it once so it can be called for subsequent runs (perhaps on other datasets, or if you get back a boundary solution you did not anticipate).

do_bayes <- function(n_design = NULL, opt_steps = NULL, of = obj.fun, seed = 42) {
  set.seed(seed)

  des <- generateDesign(n=n_design,
                        par.set = getParamSet(of),
                        fun = lhs::randomLHS)

  control <- makeMBOControl() %>%
    setMBOControlTermination(., iters = opt_steps)

  ## kriging with a matern(3,2) covariance function is the default surrogate model for numerical domains
  ## but if you wanted to override this you could modify the makeLearner() call below to define your own
  ## GP surrogate model with more or lesss smoothness, or use an entirely different method
  run <- mbo(fun = of,
             design = des,
             learner = makeLearner("regr.km", predict.type = "se", covtype = "matern3_2", control = list(trace = FALSE)),
             control = control, 
             show.info = TRUE)

  opt_plot <- run$opt.path$env$path %>%
    mutate(Round = row_number()) %>%
    mutate(type = case_when(Round <= n_design ~ "Design",
                            TRUE ~ "mlrMBO optimization")) %>%
    ggplot(aes(x= Round, y= y, color= type)) + 
    geom_point() +
    labs(title = "mlrMBO optimization") +
    ylab("-log(likelihood)")

  print(run$x)

  return(list(run = run, plot = opt_plot))
}

Number of evaluations

Normally for this problem I would perform more evaluations, in both the intial and optimizing phases. Something around 5 -7 times the number of parameters being tuned for the initial design and half of that for the number of optimization steps could be a rule of thumb. You need some points in the space to have something to interpolate between!

Here's my intial design of 15 points.

des <- generateDesign(n=15,
                      par.set = getParamSet(obj.fun),
                      fun = lhs::randomLHS)

kable(des, format = "html", digits = 4) %>% 
  kable_styling(font_size = 10) %>%
  kable_material_dark()
eta gamma max_depth min_child_weight subsample colsample_bytree max_delta_step tweedie_variance_power
0.0064 1.7337 5 1259 0.3795 0.3166 4.0727 1.8245
0.0074 3.2956 6 1073 0.4897 0.2119 1.0432 1.7833
0.0062 1.4628 9 1451 0.7799 0.5837 3.0967 1.8219
0.0059 4.5578 3 703 0.2135 0.3206 1.6347 1.8346
0.0081 4.0202 8 445 0.4407 0.5495 2.2957 1.7879
0.0067 3.5629 3 330 0.4207 0.4391 0.9399 1.7998
0.0092 4.4374 7 1937 0.6411 0.7922 3.9315 1.7704
0.0099 1.8265 7 1595 0.2960 0.3961 2.5070 1.8399
0.0055 2.4464 9 805 0.5314 0.2457 3.6427 1.7624
0.0096 2.2831 10 1662 0.6873 0.6075 0.2518 1.8457
0.0079 2.8737 4 894 0.2790 0.4954 0.4517 1.7965
0.0087 2.8600 6 599 0.5717 0.6537 4.9145 1.7557
0.0053 1.0082 2 1863 0.6021 0.7436 2.7048 1.7686
0.0071 4.9817 2 1380 0.3599 0.4507 4.3572 1.8156
0.0086 3.6847 9 1124 0.7373 0.6935 1.9759 1.8043

And here is a view to see how well those points fill out 3 of the 7 dimensions

scatterplot3d(des$eta, des$gamma, des$min_child_weight,
              type = "h", color = "blue", pch = 16)

plot of chunk 3d1

We can see large areas with no nearby points – if the global optimum lies here, we may still end up with proposals in this area that lead us to find it, but it sure would be helpful to gather some info there and guarantee it. Here's a better design with 6 points per hyperparameter.

des <- generateDesign(n=42,
                      par.set = getParamSet(obj.fun),
                      fun = lhs::randomLHS)
scatterplot3d(des$eta, des$gamma, des$min_child_weight,
              type = "h", color = "blue", pch = 16)

plot of chunk 3d2

This would take longer to run, but we will rely less heavily on interpolation over long distances during the optimizing phase because we have more information observed through experiments. Choosing your design is about the trade-off between desired accuracy and computational expense. So use as many points in the initial design as you can afford time for (aiming for at least 5-7 per parameter), and maybe half as many for the number of subsequent optimization steps.

Do Bayes!

Now that we are all set up, let's run the procedure using our do_bayes() function above and evaluate the result. As discussed above, I recommend sizing your random design and optimization steps according to the size of your hyperparameter space, using 5-7 points per hyperparameter as a rule of thumb. You can also figure out roughly how much time a single evaluation take (which will depend on the hyperparameter values, so this should be an estimate of the mean time), as well as how much time you can budget, and then choose the values that work for you. Here I use 25 total runs – 15 initial evaluations, and 10 optimization steps.

(Note: The verbose output for each evaluation is shown below for your interest)

dm <- fre_dm
runs <- do_bayes(n_design = 15, of = obj.fun, opt_steps = 10, seed = 42)

## Computing y column(s) for design. Not provided.

## [1]  [email protected]:342.947351+9.068011   [email protected]:342.950897+36.281204 
## Multiple eval metrics are present. Will use [email protected] for early stopping.
## Will train until [email protected] hasn't improved in 25 rounds.
## 
## [501]    [email protected]:54.102369+1.254521    [email protected]:54.103011+5.019251 
## [1001]   [email protected]:17.471831+0.173606    [email protected]:17.486700+0.699454 
## [1501]   [email protected]:15.143893+0.056641    [email protected]:15.371196+0.251584 
## [2001]   [email protected]:14.794773+0.047399    [email protected]:15.161293+0.229903 
## [2501]   [email protected]:14.583392+0.051122    [email protected]:15.069248+0.245231 
## [3001]   [email protected]:14.419826+0.051263    [email protected]:14.996046+0.258229 
## [3501]   [email protected]:14.281899+0.049542    [email protected]:14.944954+0.278042 
## [4001]   [email protected]:14.162422+0.046876    [email protected]:14.902480+0.299742 
## [4501]   [email protected]:14.056329+0.045482    [email protected]:14.861813+0.318562 
## Stopping. Best iteration:
## [4597]   [email protected]:14.037272+0.045602    [email protected]:14.852832+0.320272
## 
## [1]  [email protected]:341.751507+9.223267   [email protected]:341.757257+36.900142 
## Multiple eval metrics are present. Will use [email protected] for early stopping.
## Will train until [email protected] hasn't improved in 25 rounds.
## 
## [501]    [email protected]:20.067214+0.271193    [email protected]:20.130995+1.160850 
## [1001]   [email protected]:15.076158+0.069745    [email protected]:15.599540+0.313749 
## [1501]   [email protected]:14.517137+0.066595    [email protected]:15.309150+0.339692 
## [2001]   [email protected]:14.163563+0.061668    [email protected]:15.147976+0.382249 
## [2501]   [email protected]:13.890958+0.068184    [email protected]:15.034968+0.404933 
## [3001]   [email protected]:13.663806+0.063876    [email protected]:14.950204+0.428143 
## [3501]   [email protected]:13.467250+0.063427    [email protected]:14.885284+0.460394 
## [4001]   [email protected]:13.293906+0.060834    [email protected]:14.837956+0.493384 
## Stopping. Best iteration:
## [4190]   [email protected]:13.231073+0.059948    [email protected]:14.818203+0.503775
## 
## [1]  [email protected]:342.471167+9.032795    [email protected]:342.480517+36.144871 
## Multiple eval metrics are present. Will use [email protected] for early stopping.
## Will train until [email protected] hasn't improved in 25 rounds.
## 
## [501]    [email protected]:28.357122+0.477149 [email protected]:28.895080+2.147141 
## [1001]   [email protected]:14.938225+0.058305 [email protected]:15.482690+0.376248 
## [1501]   [email protected]:14.236916+0.048579 [email protected]:14.920337+0.307540 
## [2001]   [email protected]:13.941143+0.047951 [email protected]:14.796813+0.353247 
## [2501]   [email protected]:13.719723+0.047280 [email protected]:14.722402+0.396149 
## [3001]   [email protected]:13.536447+0.045626 [email protected]:14.666037+0.429260 
## Stopping. Best iteration:
## [3171]   [email protected]:13.480152+0.046018 [email protected]:14.652766+0.442854
## 
## [1]  [email protected]:340.990906+9.109456   [email protected]:340.997974+36.447143 
## Multiple eval metrics are present. Will use [email protected] for early stopping.
## Will train until [email protected] hasn't improved in 25 rounds.
## 
## [501]    [email protected]:17.171271+0.149287    [email protected]:17.612757+0.749392 
## [1001]   [email protected]:14.128140+0.059918    [email protected]:14.910039+0.330671 
## [1501]   [email protected]:13.571417+0.056644    [email protected]:14.694386+0.417752 
## Stopping. Best iteration:
## [1847]   [email protected]:13.292618+0.053193    [email protected]:14.617792+0.481956
## 
## [1]  [email protected]:340.924103+9.220978   [email protected]:340.936652+36.904872 
## Multiple eval metrics are present. Will use [email protected] for early stopping.
## Will train until [email protected] hasn't improved in 25 rounds.
## 
## [501]    [email protected]:15.810031+0.105204    [email protected]:16.556760+0.587635 
## [1001]   [email protected]:13.707083+0.061060    [email protected]:14.973095+0.458299 
## [1501]   [email protected]:13.042243+0.062065    [email protected]:14.767002+0.587105 
## Stopping. Best iteration:
## [1888]   [email protected]:12.664030+0.058594    [email protected]:14.693107+0.666969
## 
## [1]  [email protected]:341.573523+9.099887   [email protected]:341.579096+36.409106 
## Multiple eval metrics are present. Will use [email protected] for early stopping.
## Will train until [email protected] hasn't improved in 25 rounds.
## 
## [501]    [email protected]:21.698555+0.287551    [email protected]:21.802563+1.318389 
## [1001]   [email protected]:15.316608+0.057473    [email protected]:15.507609+0.267001 
## [1501]   [email protected]:15.002261+0.055740    [email protected]:15.310712+0.234783 
## [2001]   [email protected]:14.813123+0.056066    [email protected]:15.225331+0.251187 
## [2501]   [email protected]:14.666221+0.055383    [email protected]:15.168456+0.269639 
## [3001]   [email protected]:14.541463+0.053780    [email protected]:15.124595+0.291749 
## [3501]   [email protected]:14.435767+0.053397    [email protected]:15.090134+0.312649 
## [4001]   [email protected]:14.342162+0.053199    [email protected]:15.060593+0.328036 
## [4501]   [email protected]:14.255980+0.053683    [email protected]:15.030732+0.343309 
## Stopping. Best iteration:
## [4585]   [email protected]:14.242503+0.053874    [email protected]:15.024390+0.345340
## 
## [1]  [email protected]:341.854852+9.086251   [email protected]:341.863275+36.356180 
## Multiple eval metrics are present. Will use [email protected] for early stopping.
## Will train until [email protected] hasn't improved in 25 rounds.
## 
## [501]    [email protected]:25.013295+0.397345    [email protected]:25.676643+1.808151 
## [1001]   [email protected]:14.504071+0.055626    [email protected]:15.187462+0.380966 
## [1501]   [email protected]:13.875887+0.049737    [email protected]:14.765052+0.371576 
## [2001]   [email protected]:13.542809+0.049598    [email protected]:14.638334+0.423159 
## Stopping. Best iteration:
## [2270]   [email protected]:13.398557+0.049952    [email protected]:14.601097+0.448479
## 
## [1]  [email protected]:341.348248+9.195115   [email protected]:341.357587+36.789846 
## Multiple eval metrics are present. Will use [email protected] for early stopping.
## Will train until [email protected] hasn't improved in 25 rounds.
## 
## [501]    [email protected]:18.174807+0.195105    [email protected]:19.060744+1.022248 
## [1001]   [email protected]:13.397231+0.063749    [email protected]:14.732965+0.451939 
## Stopping. Best iteration:
## [1460]   [email protected]:12.673491+0.062628    [email protected]:14.487809+0.572331
## 
## [1]  [email protected]:341.976398+8.988454   [email protected]:341.984406+35.983697 
## Multiple eval metrics are present. Will use [email protected] for early stopping.
## Will train until [email protected] hasn't improved in 25 rounds.
## 
## [501]    [email protected]:18.400557+0.165181    [email protected]:18.735286+0.850735 
## [1001]   [email protected]:14.588454+0.047146    [email protected]:15.120865+0.271691 
## [1501]   [email protected]:14.106823+0.046601    [email protected]:14.891420+0.316660 
## Stopping. Best iteration:
## [1936]   [email protected]:13.822906+0.048588    [email protected]:14.787369+0.361366
## 
## [1]  [email protected]:343.904297+9.327030   [email protected]:343.909997+37.314570 
## Multiple eval metrics are present. Will use [email protected] for early stopping.
## Will train until [email protected] hasn't improved in 25 rounds.
## 
## [501]    [email protected]:153.005731+4.071955   [email protected]:153.008331+16.291001 
## [1001]   [email protected]:70.474121+1.777711    [email protected]:70.475147+7.112157 
## [1501]   [email protected]:35.485048+0.776015    [email protected]:35.485494+3.104626 
## [2001]   [email protected]:21.549418+0.338670    [email protected]:21.549704+1.354912 
## [2501]   [email protected]:17.189798+0.147854    [email protected]:17.210399+0.612728 
## [3001]   [email protected]:16.418288+0.094189    [email protected]:16.555264+0.433128 
## [3501]   [email protected]:16.109250+0.081212    [email protected]:16.344176+0.387444 
## [4001]   [email protected]:15.925080+0.078790    [email protected]:16.233377+0.369088 
## [4501]   [email protected]:15.793141+0.077723    [email protected]:16.158620+0.370770 
## [5001]   [email protected]:15.686279+0.075960    [email protected]:16.110775+0.376137 
## [5501]   [email protected]:15.597672+0.076042    [email protected]:16.076505+0.388222 
## Stopping. Best iteration:
## [5707]   [email protected]:15.563175+0.074569    [email protected]:16.059571+0.391720
## 
## [1]  [email protected]:342.328162+9.312946    [email protected]:342.334070+37.259847 
## Multiple eval metrics are present. Will use [email protected] for early stopping.
## Will train until [email protected] hasn't improved in 25 rounds.
## 
## [501]    [email protected]:21.034394+0.261867 [email protected]:21.110715+1.265011 
## [1001]   [email protected]:16.515162+0.077981 [email protected]:16.665819+0.360601 
## [1501]   [email protected]:16.270615+0.075134 [email protected]:16.505939+0.319116 
## [2001]   [email protected]:16.125761+0.075163 [email protected]:16.432678+0.320781 
## [2501]   [email protected]:16.014767+0.073620 [email protected]:16.385424+0.331610 
## [3001]   [email protected]:15.920065+0.071297 [email protected]:16.346298+0.346168 
## Stopping. Best iteration:
## [3228]   [email protected]:15.881553+0.070426 [email protected]:16.329813+0.351423
## 
## [1]  [email protected]:341.820270+9.143934   [email protected]:341.825208+36.586124 
## Multiple eval metrics are present. Will use [email protected] for early stopping.
## Will train until [email protected] hasn't improved in 25 rounds.
## 
## [501]    [email protected]:26.608221+0.443646    [email protected]:26.664685+1.977213 
## [1001]   [email protected]:15.824172+0.068891    [email protected]:15.918101+0.371526 
## [1501]   [email protected]:15.499650+0.061893    [email protected]:15.647616+0.269262 
## [2001]   [email protected]:15.374731+0.061671    [email protected]:15.573182+0.258539 
## [2501]   [email protected]:15.289750+0.061845    [email protected]:15.534944+0.261417 
## Stopping. Best iteration:
## [2579]   [email protected]:15.278433+0.061600    [email protected]:15.529791+0.262305
## 
## [1]  [email protected]:343.394391+8.997226   [email protected]:343.399872+36.001197 
## Multiple eval metrics are present. Will use [email protected] for early stopping.
## Will train until [email protected] hasn't improved in 25 rounds.
## 
## [501]    [email protected]:35.638326+0.676807    [email protected]:35.855040+2.870666 
## [1001]   [email protected]:15.927161+0.074464    [email protected]:16.216283+0.425658 
## [1501]   [email protected]:14.851506+0.047610    [email protected]:15.276635+0.237773 
## [2001]   [email protected]:14.524470+0.041982    [email protected]:15.112846+0.258227 
## [2501]   [email protected]:14.292380+0.045034    [email protected]:15.027395+0.291822 
## [3001]   [email protected]:14.103252+0.044065    [email protected]:14.970938+0.334465 
## Stopping. Best iteration:
## [3384]   [email protected]:13.977265+0.046385    [email protected]:14.943196+0.365468
## 
## [1]  [email protected]:341.213971+9.169863   [email protected]:341.220727+36.687101 
## Multiple eval metrics are present. Will use [email protected] for early stopping.
## Will train until [email protected] hasn't improved in 25 rounds.
## 
## [501]    [email protected]:17.835510+0.165372    [email protected]:18.048213+0.847644 
## [1001]   [email protected]:14.868611+0.068062    [email protected]:15.391133+0.315593 
## [1501]   [email protected]:14.301947+0.058922    [email protected]:15.099987+0.329528 
## [2001]   [email protected]:13.923899+0.058205    [email protected]:14.931884+0.371135 
## [2501]   [email protected]:13.617700+0.062158    [email protected]:14.828744+0.421567 
## [3001]   [email protected]:13.364167+0.063944    [email protected]:14.745604+0.455195 
## Stopping. Best iteration:
## [3159]   [email protected]:13.292696+0.064719    [email protected]:14.719766+0.463486
## 
## [1]  [email protected]:343.292597+9.363249   [email protected]:343.301770+37.470216 
## Multiple eval metrics are present. Will use [email protected] for early stopping.
## Will train until [email protected] hasn't improved in 25 rounds.
## 
## [501]    [email protected]:33.808357+0.652361    [email protected]:34.678754+3.054606 
## [1001]   [email protected]:16.164603+0.097345    [email protected]:17.212687+0.750414 
## [1501]   [email protected]:14.603756+0.074138    [email protected]:15.953423+0.574503 
## [2001]   [email protected]:13.968725+0.071299    [email protected]:15.654203+0.618075 
## [2501]   [email protected]:13.520048+0.068583    [email protected]:15.482204+0.654811 
## Stopping. Best iteration:
## [2566]   [email protected]:13.469965+0.068246    [email protected]:15.476034+0.661855

## [mbo] 0: eta=0.00745; gamma=3.51; max_depth=4; min_child_weight=1387; subsample=0.629; colsample_bytree=0.376; max_delta_step=0.64; tweedie_variance_power=1.83 : y = 14.9 : 365.7 secs : initdesign

## [mbo] 0: eta=0.00941; gamma=1.73; max_depth=6; min_child_weight=1729; subsample=0.665; colsample_bytree=0.277; max_delta_step=0.95; tweedie_variance_power=1.78 : y = 14.8 : 401.0 secs : initdesign

## [mbo] 0: eta=0.00594; gamma=2.63; max_depth=9; min_child_weight=1844; subsample=0.406; colsample_bytree=0.723; max_delta_step=4.15; tweedie_variance_power=1.84 : y = 14.7 : 504.6 secs : initdesign

## [mbo] 0: eta=0.00858; gamma=4.15; max_depth=6; min_child_weight=974; subsample=0.772; colsample_bytree=0.686; max_delta_step=3.91; tweedie_variance_power=1.81 : y = 14.6 : 199.2 secs : initdesign

## [mbo] 0: eta=0.00993; gamma=4.57; max_depth=8; min_child_weight=1068; subsample=0.688; colsample_bytree=0.441; max_delta_step=2.05; tweedie_variance_power=1.77 : y = 14.7 : 239.7 secs : initdesign

## [mbo] 0: eta=0.00698; gamma=1.95; max_depth=3; min_child_weight=1116; subsample=0.581; colsample_bytree=0.647; max_delta_step=4.48; tweedie_variance_power=1.81 : y = 15 : 366.1 secs : initdesign

## [mbo] 0: eta=0.00637; gamma=3.72; max_depth=10; min_child_weight=1934; subsample=0.457; colsample_bytree=0.779; max_delta_step=2.48; tweedie_variance_power=1.82 : y = 14.6 : 415.2 secs : initdesign

## [mbo] 0: eta=0.00794; gamma=2.22; max_depth=9; min_child_weight=840; subsample=0.757; colsample_bytree=0.515; max_delta_step=3.11; tweedie_variance_power=1.79 : y = 14.5 : 206.0 secs : initdesign

## [mbo] 0: eta=0.00825; gamma=4.93; max_depth=7; min_child_weight=616; subsample=0.325; colsample_bytree=0.292; max_delta_step=3.4; tweedie_variance_power=1.84 : y = 14.8 : 213.4 secs : initdesign

## [mbo] 0: eta=0.00871; gamma=2.37; max_depth=3; min_child_weight=1553; subsample=0.221; colsample_bytree=0.324; max_delta_step=0.248; tweedie_variance_power=1.77 : y = 16.1 : 359.8 secs : initdesign

## [mbo] 0: eta=0.0073; gamma=4.34; max_depth=2; min_child_weight=1484; subsample=0.526; colsample_bytree=0.618; max_delta_step=1.99; tweedie_variance_power=1.76 : y = 16.3 : 233.8 secs : initdesign

## [mbo] 0: eta=0.00608; gamma=1.07; max_depth=2; min_child_weight=1280; subsample=0.293; colsample_bytree=0.402; max_delta_step=2.82; tweedie_variance_power=1.8 : y = 15.5 : 196.2 secs : initdesign

## [mbo] 0: eta=0.00519; gamma=3.23; max_depth=5; min_child_weight=514; subsample=0.5; colsample_bytree=0.527; max_delta_step=1.51; tweedie_variance_power=1.85 : y = 14.9 : 380.0 secs : initdesign

## [mbo] 0: eta=0.0091; gamma=1.49; max_depth=6; min_child_weight=321; subsample=0.362; colsample_bytree=0.21; max_delta_step=1.12; tweedie_variance_power=1.79 : y = 14.7 : 300.6 secs : initdesign

## [mbo] 0: eta=0.00543; gamma=2.87; max_depth=10; min_child_weight=731; subsample=0.243; colsample_bytree=0.595; max_delta_step=4.85; tweedie_variance_power=1.75 : y = 15.5 : 350.0 secs : initdesign

## [1]  [email protected]:340.715326+9.076589   [email protected]:340.723718+36.318898 
## Multiple eval metrics are present. Will use [email protected] for early stopping.
## Will train until [email protected] hasn't improved in 25 rounds.
## 
## [501]    [email protected]:15.918866+0.102574    [email protected]:16.461088+0.571469 
## [1001]   [email protected]:13.785034+0.049437    [email protected]:14.672956+0.330847 
## [1501]   [email protected]:13.252387+0.053019    [email protected]:14.476279+0.413990 
## [2001]   [email protected]:12.879603+0.050442    [email protected]:14.391401+0.501251 
## Stopping. Best iteration:
## [2023]   [email protected]:12.864722+0.050294    [email protected]:14.387596+0.505142

## [mbo] 1: eta=0.00954; gamma=1.9; max_depth=8; min_child_weight=1225; subsample=0.731; colsample_bytree=0.349; max_delta_step=2.2; tweedie_variance_power=1.81 : y = 14.4 : 249.3 secs : infill_cb

## [1]  [email protected]:343.092389+9.184359   [email protected]:343.097986+36.745005 
## Multiple eval metrics are present. Will use [email protected] for early stopping.
## Will train until [email protected] hasn't improved in 25 rounds.
## 
## [501]    [email protected]:113.222479+2.921018   [email protected]:113.224371+11.686362 
## [1001]   [email protected]:41.464624+0.928879    [email protected]:41.465188+3.716334 
## [1501]   [email protected]:20.439788+0.295234    [email protected]:20.440313+1.181873 
## [2001]   [email protected]:15.711758+0.113962    [email protected]:15.898547+0.431068 
## [2501]   [email protected]:14.325745+0.070814    [email protected]:15.013139+0.345294 
## [3001]   [email protected]:13.676998+0.056270    [email protected]:14.686207+0.365084 
## [3501]   [email protected]:13.235437+0.057295    [email protected]:14.506423+0.408045 
## [4001]   [email protected]:12.878366+0.056469    [email protected]:14.409738+0.476704 
## Stopping. Best iteration:
## [4439]   [email protected]:12.602206+0.057227    [email protected]:14.353508+0.526717

## [mbo] 2: eta=0.00911; gamma=1.77; max_depth=8; min_child_weight=868; subsample=0.791; colsample_bytree=0.549; max_delta_step=0.314; tweedie_variance_power=1.8 : y = 14.4 : 422.2 secs : infill_cb

## [1]  [email protected]:341.616339+9.151591   [email protected]:341.621362+36.614048 
## Multiple eval metrics are present. Will use [email protected] for early stopping.
## Will train until [email protected] hasn't improved in 25 rounds.
## 
## [501]    [email protected]:21.260748+0.323534    [email protected]:21.276269+1.295378 
## [1001]   [email protected]:14.082018+0.051754    [email protected]:14.875722+0.333678 
## [1501]   [email protected]:13.233551+0.051804    [email protected]:14.484654+0.410128 
## [2001]   [email protected]:12.712292+0.049849    [email protected]:14.331855+0.503634 
## Stopping. Best iteration:
## [2290]   [email protected]:12.472694+0.053111    [email protected]:14.298177+0.575406

## [mbo] 3: eta=0.00942; gamma=3.15; max_depth=9; min_child_weight=843; subsample=0.783; colsample_bytree=0.271; max_delta_step=0.888; tweedie_variance_power=1.8 : y = 14.3 : 252.1 secs : infill_cb

## [1]  [email protected]:340.514740+9.107514    [email protected]:340.523969+36.446491 
## Multiple eval metrics are present. Will use [email protected] for early stopping.
## Will train until [email protected] hasn't improved in 25 rounds.
## 
## [501]    [email protected]:14.909694+0.100825 [email protected]:16.029717+0.614204 
## [1001]   [email protected]:12.276371+0.058075 [email protected]:14.299033+0.636373 
## Stopping. Best iteration:
## [1016]   [email protected]:12.243908+0.059515 [email protected]:14.297354+0.646123

## [mbo] 4: eta=0.00999; gamma=2.8; max_depth=9; min_child_weight=342; subsample=0.683; colsample_bytree=0.49; max_delta_step=1.58; tweedie_variance_power=1.8 : y = 14.3 : 155.0 secs : infill_cb

## [1]  [email protected]:340.693542+9.123636   [email protected]:340.698126+36.502615 
## Multiple eval metrics are present. Will use [email protected] for early stopping.
## Will train until [email protected] hasn't improved in 25 rounds.
## 
## [501]    [email protected]:15.513652+0.107246    [email protected]:16.291323+0.599968 
## [1001]   [email protected]:13.127623+0.057521    [email protected]:14.478948+0.468251 
## [1501]   [email protected]:12.450628+0.057462    [email protected]:14.320771+0.610130 
## Stopping. Best iteration:
## [1521]   [email protected]:12.427706+0.057687    [email protected]:14.317007+0.616377

## [mbo] 5: eta=0.01; gamma=2.63; max_depth=10; min_child_weight=1081; subsample=0.698; colsample_bytree=0.493; max_delta_step=1.18; tweedie_variance_power=1.8 : y = 14.3 : 205.2 secs : infill_cb

## [1]  [email protected]:340.537445+9.134006   [email protected]:340.552142+36.558473 
## Multiple eval metrics are present. Will use [email protected] for early stopping.
## Will train until [email protected] hasn't improved in 25 rounds.
## 
## [501]    [email protected]:15.423422+0.099615    [email protected]:16.142575+0.549508 
## [1001]   [email protected]:13.201343+0.056114    [email protected]:14.503582+0.419961 
## [1501]   [email protected]:12.487963+0.057690    [email protected]:14.305727+0.556182 
## Stopping. Best iteration:
## [1697]   [email protected]:12.274374+0.058514    [email protected]:14.272447+0.606904

## [mbo] 6: eta=0.00999; gamma=1.42; max_depth=9; min_child_weight=584; subsample=0.769; colsample_bytree=0.272; max_delta_step=2.29; tweedie_variance_power=1.8 : y = 14.3 : 217.2 secs : infill_cb

## [1]  [email protected]:341.503680+9.166254   [email protected]:341.508606+36.672458 
## Multiple eval metrics are present. Will use [email protected] for early stopping.
## Will train until [email protected] hasn't improved in 25 rounds.
## 
## [501]    [email protected]:19.509591+0.261787    [email protected]:19.639546+1.096469 
## [1001]   [email protected]:13.144418+0.052379    [email protected]:14.496626+0.423258 
## [1501]   [email protected]:12.073966+0.057633    [email protected]:14.226620+0.590800 
## Stopping. Best iteration:
## [1514]   [email protected]:12.054614+0.057727    [email protected]:14.224718+0.595632

## [mbo] 7: eta=0.00991; gamma=1.58; max_depth=10; min_child_weight=453; subsample=0.77; colsample_bytree=0.352; max_delta_step=0.902; tweedie_variance_power=1.79 : y = 14.2 : 182.2 secs : infill_cb

## [1]  [email protected]:342.887091+9.178540   [email protected]:342.892096+36.721835 
## Multiple eval metrics are present. Will use [email protected] for early stopping.
## Will train until [email protected] hasn't improved in 25 rounds.
## 
## [501]    [email protected]:85.321825+2.153553    [email protected]:85.322955+8.616076 
## [1001]   [email protected]:26.996887+0.505121    [email protected]:26.997190+2.021150 
## [1501]   [email protected]:16.226591+0.120683    [email protected]:16.298045+0.500625 
## [2001]   [email protected]:14.522975+0.055452    [email protected]:15.077755+0.326529 
## [2501]   [email protected]:13.705078+0.062120    [email protected]:14.643362+0.337972 
## [3001]   [email protected]:13.174421+0.062708    [email protected]:14.423376+0.381127 
## [3501]   [email protected]:12.755801+0.063702    [email protected]:14.289940+0.439437 
## Stopping. Best iteration:
## [3829]   [email protected]:12.516629+0.066593    [email protected]:14.231650+0.485118

## [mbo] 8: eta=0.00908; gamma=1.13; max_depth=10; min_child_weight=301; subsample=0.709; colsample_bytree=0.2; max_delta_step=0.399; tweedie_variance_power=1.8 : y = 14.2 : 391.0 secs : infill_cb

## [1]  [email protected]:340.954138+9.115118   [email protected]:340.965655+36.477642 
## Multiple eval metrics are present. Will use [email protected] for early stopping.
## Will train until [email protected] hasn't improved in 25 rounds.
## 
## [501]    [email protected]:16.280642+0.145262    [email protected]:17.426506+0.818395 
## [1001]   [email protected]:12.257854+0.055505    [email protected]:14.196385+0.579480 
## Stopping. Best iteration:
## [1075]   [email protected]:12.083582+0.060371    [email protected]:14.174686+0.630563

## [mbo] 9: eta=0.00868; gamma=1.42; max_depth=10; min_child_weight=302; subsample=0.799; colsample_bytree=0.376; max_delta_step=1.39; tweedie_variance_power=1.8 : y = 14.2 : 165.5 secs : infill_cb

## [1]  [email protected]:341.102509+9.068973    [email protected]:341.113751+36.288728 
## Multiple eval metrics are present. Will use [email protected] for early stopping.
## Will train until [email protected] hasn't improved in 25 rounds.
## 
## [501]    [email protected]:16.397836+0.150262 [email protected]:17.661101+0.874430 
## [1001]   [email protected]:12.316689+0.061103 [email protected]:14.303786+0.625352 
## Stopping. Best iteration:
## [1019]   [email protected]:12.277460+0.060180 [email protected]:14.299007+0.638777

## [mbo] 10: eta=0.00858; gamma=1.26; max_depth=10; min_child_weight=467; subsample=0.798; colsample_bytree=0.641; max_delta_step=1.38; tweedie_variance_power=1.82 : y = 14.3 : 170.7 secs : infill_cb

## $eta
## [1] 0.008678109
## 
## $gamma
## [1] 1.423488
## 
## $max_depth
## [1] 10
## 
## $min_child_weight
## [1] 302
## 
## $subsample
## [1] 0.7986374
## 
## $colsample_bytree
## [1] 0.3759013
## 
## $max_delta_step
## [1] 1.391834
## 
## $tweedie_variance_power
## [1] 1.803532

Diagnostics and evaluating result

Results in hand, we want to check some diagnostics starting with the objective function evaluation for all of the runs.

runs$plot

plot of chunk plot_path

The plot above (see our do_bayes() function for how we extracted this info) shows the best test evaluation result for each run – the initial design is colored red and the optimization runs are in blue. The hyperparameter values which produced those evaluations were the ones chosen through kriging. We can see from this that none of the random evaluations gave a top result, but together they did provide solid information on where in the hyperparameter space we should focus our search in order to optimize the algorithm. Every subsequent proposal was better than all of the random ones.

The “default” viz below comes from the plot() S3 class method for MBOSingleObjResult and shows a few useful things, although the formatting could be improved. Most importantly, the top left plot shows the “scaled” values for each set of hyperparameters, for each run. Use this to confirm your recommended solution does not include any hyperparameter at the boundary of the values tested – if it does, then expand the range of that parameter in your objective function and re-run. In my example below, you can see that the optimal solution (the green line) includes a value for max_depth at the maximum of 10, and a min_child_weight at or near the minimum (300) of the range I had allowed. Unless I were intentionally using these bounds to limit model complexity and improve generalization, I should try expanding the ranges of these hyperparameters and running again.

class(runs$run) %>% print

## [1] "MBOSingleObjResult" "MBOResult"

plot(runs$run)

plot of chunk plot_runs

If you print the result object you can confirm the recommended solution included these boundary values:

print(runs$run)

## Recommended parameters:
## eta=0.00868; gamma=1.42; max_depth=10; min_child_weight=302; subsample=0.799; colsample_bytree=0.376; max_delta_step=1.39; tweedie_variance_power=1.8
## Objective: y = 14.175
## 
## Optimization path
## 15 + 10 entries in total, displaying last 10 (or less):
##            eta    gamma max_depth min_child_weight subsample colsample_bytree max_delta_step tweedie_variance_power
## 16 0.009543908 1.904103         8             1225 0.7305367        0.3488211      2.1958369               1.812051
## 17 0.009114593 1.773073         8              868 0.7911231        0.5494522      0.3138641               1.800874
## 18 0.009421640 3.150876         9              843 0.7830561        0.2705329      0.8878380               1.798907
## 19 0.009990520 2.801814         9              342 0.6832292        0.4898784      1.5775174               1.802298
## 20 0.009996788 2.626826        10             1081 0.6979587        0.4934878      1.1786900               1.799649
## 21 0.009986297 1.419945         9              584 0.7690373        0.2716777      2.2940200               1.795434
## 22 0.009913078 1.579795        10              453 0.7699498        0.3517838      0.9017585               1.794166
## 23 0.009082225 1.130765        10              301 0.7085142        0.2004467      0.3985858               1.800927
## 24 0.008678109 1.423488        10              302 0.7986374        0.3759013      1.3918335               1.803532
## 25 0.008579770 1.258053        10              467 0.7982686        0.6407762      1.3812183               1.816803
##           y dob eol error.message exec.time       cb error.model train.time prop.type propose.time         se     mean
## 16 14.38760   1  NA          <NA>   249.294 14.08443        <NA>      0.417 infill_cb        1.041 0.26275330 14.34718
## 17 14.35351   2  NA          <NA>   422.219 14.20226        <NA>      0.150 infill_cb        1.771 0.21492201 14.41718
## 18 14.29818   3  NA          <NA>   252.086 14.20628        <NA>      0.123 infill_cb        1.537 0.19145391 14.39773
## 19 14.29735   4  NA          <NA>   155.010 14.19001        <NA>      0.174 infill_cb        1.474 0.15554575 14.34556
## 20 14.31701   5  NA          <NA>   205.205 14.19583        <NA>      0.333 infill_cb        1.299 0.15011713 14.34595
## 21 14.27245   6  NA          <NA>   217.224 14.20543        <NA>      0.122 infill_cb        1.356 0.12818859 14.33362
## 22 14.22472   7  NA          <NA>   182.158 14.19185        <NA>      0.101 infill_cb        1.510 0.10787970 14.29973
## 23 14.23165   8  NA          <NA>   390.982 14.16640        <NA>      0.180 infill_cb        1.438 0.12182032 14.28822
## 24 14.17469   9  NA          <NA>   165.500 14.16323        <NA>      0.113 infill_cb        1.142 0.08824657 14.25147
## 25 14.29901  10  NA          <NA>   170.738 14.12875        <NA>      0.097 infill_cb        1.179 0.14723428 14.27598
##    lambda
## 16      1
## 17      1
## 18      1
## 19      1
## 20      1
## 21      1
## 22      1
## 23      1
## 24      1
## 25      1

Using the result

Assuming we are happy with the result, we should then have what we need to proceed with model training. However, since xgb.cv() now uses early stopping and nrounds is not a tuning parameter, we did not capture this needed information in our MBO result. So we need to run one more evaluation the old-fashioned way, calling xgb.cv() directly using the best hyperparameters we found.

best.params <- runs$run$x
print(best.params)

## $eta
## [1] 0.008678109
## 
## $gamma
## [1] 1.423488
## 
## $max_depth
## [1] 10
## 
## $min_child_weight
## [1] 302
## 
## $subsample
## [1] 0.7986374
## 
## $colsample_bytree
## [1] 0.3759013
## 
## $max_delta_step
## [1] 1.391834
## 
## $tweedie_variance_power
## [1] 1.803532

We add the model parameters which were fixed during optimization to this list:

best.params$booster <- "gbtree"
best.params$objective <- "reg:tweedie"

Now we cross-validate the number of rounds to use, fixing our best hyperparameters:

optimal.cv <- xgb.cv(params = best.params,
                     data = fre_dm,
                     nrounds = 6000,
                     nthread = 26,
                     nfold = 5,
                     prediction = FALSE,
                     showsd = TRUE,
                     early_stopping_rounds = 25,
                     verbose = 1,
                     print_every_n = 500)

## [1]  [email protected]:340.950989+7.692406   [email protected]:340.929736+30.743659 
## Multiple eval metrics are present. Will use [email protected] for early stopping.
## Will train until [email protected] hasn't improved in 25 rounds.
## 
## [501]    [email protected]:16.281491+0.131605    [email protected]:17.436350+0.607008 
## [1001]   [email protected]:12.254035+0.041286    [email protected]:14.250821+0.548260 
## Stopping. Best iteration:
## [1060]   [email protected]:12.114213+0.035136    [email protected]:14.242658+0.593849

Obtain the best number of rounds…

best.params$nrounds <- optimal.cv$best_ntreelimit
best.params[[11]] %>% print

## [1] 1060

…and finally, train the final learner:

final.model <- xgboost(params = best.params[-11], ## do not include nrounds here
                       data = fre_dm,
                       nrounds = best.params$nrounds,
                       verbose = 1,
                       print_every_n = 500)

## [1]  [email protected]:340.952393 
## [501]    [email protected]:16.252325 
## [1001]   [email protected]:12.189503 
## [1060]   [email protected]:12.041411

xgb.importance(model = final.model) %>% xgb.plot.importance()

plot of chunk plot_varimp

Conclusion

Bayesian optimization is a smart approach for tuning more complex learning algorithms with many hyperparameters when compute resources are slowing down the analysis. It is commonly used in deep learning, but can also be useful to when working with machine learning algorithms like GBMs (shown here), random forests, support vector machines – really anything that is going to take you too much time to run a naive grid search. Even if you are working with a relatively simple algorithm – say a lasso regression, which involves a single hyperparameter \(\lambda\) to control the shrinkage/penalty – you may have just a small amount of compute available to you. If so, then it could still make sense to use MBO and cut down the number of evaluations needed to find the optimum.

To leave a comment for the author, please follow the link and comment on their blog: R – Data Science, Insurance, Bikes, and the Meaning of Life.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)