Could AutoML win in the ‘Sliced’ Data Science Competition? The answer may shock you!

Posted on July 23, 2021 by [R]eliability in R bloggers | 0 Comments

[This article was first published on [R]eliability, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In this post I’ll be taking a break from my normal explorations in the medical device domain to talk about Sliced. Sliced is a 2-hour data science competition streamed on Twitch and hosted by Meg Risdal and Nick Wan. Four competitors tackle a prediction problem in real time using whatever coding language or tools they prefer, grabbing bonus points along the way for Data Visualization and/or stumbling onto Golden Features (hint: always calculate the air density when training on weather data). Viewers can simply kick back on watch the contestants apply their trade or they can actively participate by submitting their own solutions and seeing how they stack up on the competition leaderboard!

Here are my observations after watching a few episodes:

* Participants do not typically implement more than 2 different model types, preferring to spend their time on Feature Engineering and tuning the hyperparameters of their preferred model

* Gradient boosting (XGBoost, Catboost, etc) is the dominant technique for tabular data

To clarify the first point above – the tuning is not totally manual; grid search functions are typically employed to identify the best hyperparameters from a superset of options. But the time pressure of the competition means that players can’t set up massive grids that lock up compute resources for too long. So it’s generally an iterative process over small grids that are expanded and contracted as needed based on intermediate results of the model predicting on a test set.

All this led me to wonder: given the somewhat manual process of hyperparameter optimization and the restricted number of model types… how would AutoML fare in Sliced? The rest of this post will attempt to answer that question, at least for an arbitrary Sliced episode.

Setup

For this exercise we’ll use the dataset and metrics from Episode 7 in which we are asked to predict whether or not a bank customer churned. The scoring metric is LogLoss. I’ll be using the free version of the h2o.ai framework and take the following approach to feature engineering and variable selection:

All variables will be used (churn explained by everything) and no feature engineering except imputing means for missing values, converting nominal predictors to dummy variables, and removing ID column. This should give a fair look at how h2o will do given the bare minimum of attention to pre-processing and no attention to model selection or hyperparameter range.

Let’s get to it.

Load libraries

library(tidymodels)
library(tidyverse)
library(h2o)
library(here)
library(gt)

Load dataset

Read the data in as a csv and rename the attrition column

dataset <- read_csv(here("ep7/train.csv")) %>%
  mutate(churned = case_when(
    attrition_flag == 1 ~ "yes",
    TRUE ~ "no"
  ) %>% as_factor()) %>%
  select(-attrition_flag)

holdout <- read_csv(here("ep7/test.csv"))

dataset %>%
  gt_preview() %>% 
  cols_align("center")

	id	customer_age	gender	education_level	income_category	total_relationship_count	months_inactive_12_mon	credit_limit	total_revolving_bal	total_amt_chng_q4_q1	total_trans_amt	total_trans_ct	total_ct_chng_q4_q1	avg_utilization_ratio	churned
1	8805	27	F	Post-Graduate	Less than $40K	3	2	1438.3	990	0.715	3855	73	1.147	0.688	no
2	4231	42	F	College	Less than $40K	6	4	3050.0	1824	0.771	1973	50	1.381	0.598	no
3	5263	47	F	Unknown	Less than $40K	3	3	1561.0	0	0.502	1947	28	0.556	0.000	yes
4	2072	44	M	Uneducated	$80K – $120K	1	3	25428.0	1528	0.725	13360	97	0.796	0.060	no
5	7412	54	M	Graduate	$60K – $80K	3	3	2947.0	2216	0.760	1744	53	0.606	0.752	no
6..7087
7088	7932	57	F	High School	Less than $40K	5	3	3191.0	2517	0.719	1501	35	0.591	0.789	no

Set Up Basic Recipe

This recipe starts with the model structure, imputes the mean for numeric predictors, and converts nominal variables to dummy. Id is also removed. Note that churned is described by . which means “all variables”.

basic_rec <- recipe(churned ~ .,
  data = dataset
) %>%
  step_impute_mean(all_numeric_predictors()) %>%
  step_dummy(all_nominal_predictors()) %>%
  step_rm(id)

Prep and Bake the Dataset

Prepping and baking functions will apply the recipe to the supplied dataset and make it usable in tibble format for passing to h2o.

baked_dataset_tbl <- basic_rec %>%
  prep() %>%
  bake(dataset)

baked_dataset_tbl %>%
  gt_preview() %>% 
  cols_align("center")

	customer_age	total_relationship_count	months_inactive_12_mon	credit_limit	total_revolving_bal	total_amt_chng_q4_q1	total_trans_amt	total_trans_ct	total_ct_chng_q4_q1	avg_utilization_ratio	churned	gender_M	education_level_Doctorate	education_level_Graduate	education_level_High.School	education_level_Post.Graduate	education_level_Uneducated	education_level_Unknown	income_category_X.40K….60K	income_category_X.60K….80K	income_category_X.80K….120K	income_category_Less.than..40K	income_category_Unknown
1	27	3	2	1438.3	990	0.715	3855	73	1.147	0.688	no	0	0	0	0	1	0	0	0	0	0	1	0
2	42	6	4	3050.0	1824	0.771	1973	50	1.381	0.598	no	0	0	0	0	0	0	0	0	0	0	1	0
3	47	3	3	1561.0	0	0.502	1947	28	0.556	0.000	yes	0	0	0	0	0	0	1	0	0	0	1	0
4	44	1	3	25428.0	1528	0.725	13360	97	0.796	0.060	no	1	0	0	0	0	1	0	0	0	1	0	0
5	54	3	3	2947.0	2216	0.760	1744	53	0.606	0.752	no	1	0	1	0	0	0	0	0	1	0	0	0
6..7087
7088	57	5	3	3191.0	2517	0.719	1501	35	0.591	0.789	no	0	0	0	1	0	0	0	0	0	0	1	0

Working in h2o

Convert Dataset to h2o

h2o must first be initialized and then the data can be coerced to h2o type. Using the h2o.describe() function shows a nice summary of the dataset and verifies that it was imported correctly.

h2o.init()
##  Connection successful!
## 
## R is connected to the H2O cluster: 
##     H2O cluster uptime:         8 hours 4 minutes 
##     H2O cluster timezone:       America/Los_Angeles 
##     H2O data parsing timezone:  UTC 
##     H2O cluster version:        3.32.1.3 
##     H2O cluster version age:    2 months and 5 days  
##     H2O cluster name:           H2O_started_from_R_kingr17_afp345 
##     H2O cluster total nodes:    1 
##     H2O cluster total memory:   0.85 GB 
##     H2O cluster total cores:    4 
##     H2O cluster allowed cores:  4 
##     H2O cluster healthy:        TRUE 
##     H2O Connection ip:          localhost 
##     H2O Connection port:        54321 
##     H2O Connection proxy:       NA 
##     H2O Internal Security:      FALSE 
##     H2O API Extensions:         Amazon S3, Algos, AutoML, Core V3, TargetEncoder, Core V4 
##     R Version:                  R version 4.0.3 (2020-10-10)
train_h2_tbl <- as.h2o(baked_dataset_tbl)
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |======================================================================| 100%
h2o.describe(train_h2_tbl) %>% 
  gt() %>% 
  cols_align("center")

Label	Type	Zeros	Min	Max	Mean	Sigma	Cardinality
customer_age	int	0	26.0	73.000	4.630742e+01	7.9929340	NA
total_relationship_count	int	0	1.0	6.000	3.818426e+00	1.5516740	NA
months_inactive_12_mon	int	22	0.0	6.000	2.331546e+00	1.0103784	NA
credit_limit	real	0	1438.3	34516.000	8.703875e+03	9152.3072697	NA
total_revolving_bal	int	1705	0.0	2517.000	1.169674e+03	815.4734902	NA
total_amt_chng_q4_q1	real	5	0.0	3.355	7.601713e-01	0.2213876	NA
total_trans_amt	int	0	563.0	18484.000	4.360548e+03	3339.1008390	NA
total_trans_ct	int	0	10.0	139.000	6.465378e+01	23.3431055	NA
total_ct_chng_q4_q1	real	7	0.0	3.500	7.126414e-01	0.2363882	NA
avg_utilization_ratio	real	1705	0.0	0.999	2.753678e-01	0.2755652	NA
churned	enum	5956	0.0	1.000	1.597065e-01	0.3663595	2
gender_M	int	3714	0.0	1.000	4.760158e-01	0.4994597	NA
education_level_Doctorate	int	6762	0.0	1.000	4.599323e-02	0.2094852	NA
education_level_Graduate	int	4876	0.0	1.000	3.120767e-01	0.4633737	NA
education_level_High.School	int	5710	0.0	1.000	1.944131e-01	0.3957761	NA
education_level_Post.Graduate	int	6745	0.0	1.000	4.839165e-02	0.2146075	NA
education_level_Uneducated	int	6050	0.0	1.000	1.464447e-01	0.3535764	NA
education_level_Unknown	int	6005	0.0	1.000	1.527935e-01	0.3598137	NA
income_category_X.40K….60K	int	5865	0.0	1.000	1.725451e-01	0.3778802	NA
income_category_X.60K….80K	int	6064	0.0	1.000	1.444695e-01	0.3515900	NA
income_category_X.80K….120K	int	6007	0.0	1.000	1.525113e-01	0.3595411	NA
income_category_Less.than..40K	int	4621	0.0	1.000	3.480530e-01	0.4763865	NA
income_category_Unknown	int	6316	0.0	1.000	1.089165e-01	0.3115564	NA

Specify the Response and Predictors

In h2o we must identify the response column and the predictors which we do here. Unfortunately we can’t tidyselect here I don’t think.

y <- "churned"
x <- setdiff(names(train_h2_tbl), y)

autoML Search and Optimization

Now we start the autoML session. You can specify the stopping rule by either total number of model or total duration in seconds. Since we’re simulating a timed Sliced competition, we’ll use max time. The longest I observed competitors training for was about 20 minutes, so we’ll use that here and then grab a coffee while it chugs. Notice that in this API we are not specifying any particular type of model or any hyperparameter range to optimize over.

# aml <- h2o.automl(
#   y = y,
#   x = x,
#   training_frame = train_h2_tbl,
#   project_name = "sliced_ep7_refactored_25bjuly2021",
#   max_runtime_secs = 1200,
#   seed = 07252021
# )

Leaderboard of Best Model

Not to be confused with the competition leaderboard, h2o will produce a “leaderboard” of models that it evaluated and ranked as part of its session. Here we access and observe the leaderboard and its best models.

# leaderboard_tbl <- aml@leaderboard %>% as_tibble()
# a <- leaderboard_tbl %>% gt_preview()
# a

	model_id	auc	logloss	aucpr	mean_per_class_error	rmse	mse
1	StackedEnsemble_AllModels_AutoML_20210725_011103	0.9918194	0.08285245	0.9636402	0.06007334	0.1543060	0.02381035
2	GBM_2_AutoML_20210725_011103	0.9914487	0.08636589	0.9626320	0.05620012	0.1569434	0.02463124
3	StackedEnsemble_BestOfFamily_AutoML_20210725_011103	0.9914280	0.08451327	0.9625496	0.05620012	0.1556844	0.02423764
4	GBM_grid__1_AutoML_20210725_011103_model_2	0.9913165	0.09009359	0.9616467	0.06286946	0.1600684	0.02562190
5	GBM_4_AutoML_20210725_011103	0.9912436	0.09035254	0.9602742	0.07244469	0.1599096	0.02557107
6..37
38	GLM_1_AutoML_20210725_011103	0.9152440	0.24794654	0.7267223	0.18132441	0.2727786	0.07440815

Extract Top Model

As expected, the top slot is an ensemble (in my limited experience it usually is). This is the one we’ll use to predict churn on the holdout set and submit to kaggle for our competition results. The ensemble model is extracted and stored as follows:

To prepare the holdout data predictions, we apply the basic recipe and convert to h2o, just as before with the training data.

Pre-Process the Holdout Set

We want the basic recipe applied to the holdout set so the model sees the same type of predictor variables when it goes to make predictions.

holdout_h2o_tbl <- basic_rec %>%
  prep() %>%
  bake(holdout) %>%
  as.h2o()
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |======================================================================| 100%

Make Predictions

Predictions are made using h2o.predict().

top_model_basic_preds <- h2o.predict(top_model, newdata = holdout_h2o_tbl) %>%
  as_tibble() %>%
  bind_cols(holdout) %>%
  select(id, attrition_flag = yes)
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |======================================================================| 100%
top_model_basic_preds %>%
  gt_preview()

	id	attrition_flag
1	3005	0.0020805123
2	143	0.8462792238
3	5508	0.0003131364
4	6474	0.0001977184
5	9784	0.0017571210
6..3038
3039	9822	0.9895064264

Nice! A list of predictions for each id in the holdout set. Let’s write it to file and submit.

Export Results

top_model_basic_preds %>%
  write_csv(here("run1_basic_ep7.csv"))

Scoring the Submission

To submit the file for assessment, simply upload the csv to the interface on kaggle by browsing and selecting or dragging. In a few seconds our LogLoss on holdout set is revealed. The one we care about is the private score.

A private score of 0.06921! This is honestly way better than I expected for such minimal processing. This submission would have slotted me into 3rd place out of 31 entries, just behind eventual Episode 7 winner Ethan Douglas. And remember, I didn’t need to specify any modeling engine or range of hyperparameters to tune!

Understanding the Ensemble

Just so we aren’t totally naive, let’s dig into this model a bit and see what h2o built. For an ensemble, we want to interrogate the “metalearner” which can be thought of as the model that is made up of many models. These are S4 objects which require the @ operator to dig into the different slots / items (I just learned this about 90 seconds ago).

Extract the Ensemble Metalearner Model

metalearner_model <- h2o.getModel(top_model@model$metalearner$name)

metalearner_model@model$model_summary %>% 
  as_tibble() %>%
  gt_preview() %>% 
  cols_align("center")

	family	link	regularization	number_of_predictors_total	number_of_active_predictors	number_of_iterations	training_frame
1	binomial	logit	Elastic Net (alpha = 0.5, lambda = 0.00311 )	36	12	7	levelone_training_StackedEnsemble_AllModels_AutoML_20210725_011103

Looks like we have 36 different models combined in a GLM, using Elastic Net regularization. The component models are either GBM or Deep Learning and it looks like they each have different hyperparameter grids that were searched across. I’ll put the full model output in the Appendix. Here are a list of the models that compose the ensemble and their relative importance to the ensemble. The top performing models take a greater weight.

Importance of Each Contributing Model in the Ensemble

h2o.varimp(metalearner_model) %>%
  as_tibble() %>%
  gt_preview() %>%
  cols_align("center")

	variable	relative_importance	scaled_importance	percentage
1	GBM_2_AutoML_20210725_011103	0.7014266	1.0000000	0.16199734
2	GBM_grid__1_AutoML_20210725_011103_model_18	0.6565727	0.9360534	0.15163815
3	GBM_grid__1_AutoML_20210725_011103_model_9	0.6336205	0.9033312	0.14633725
4	GBM_grid__1_AutoML_20210725_011103_model_19	0.4374311	0.6236307	0.10102651
5	GBM_grid__1_AutoML_20210725_011103_model_3	0.4047249	0.5770026	0.09347288
6..35
36	GLM_1_AutoML_20210725_011103	0.0000000	0.0000000	0.00000000

We could also visualize the data above for scaled importance of each model within the ensemble:

h2o.varimp_plot(metalearner_model)

Importance of Features in the Original Dataset

Now let’s dig into the best individual model a bit to understand the parameters and feature importance of the original dataset. The top individual model is extracted and the variable importance can be displayed just like we did for the ensemble components.

top_individual_model <- h2o.getModel(metalearner_model@model$names[1])
metalearner_model@model$names[1]
## [1] "GBM_2_AutoML_20210725_011103"
h2o.varimp(top_individual_model) %>%
  as_tibble() %>%
  gt() %>% 
  cols_align("center")

variable	relative_importance	scaled_importance	percentage
total_trans_ct	972.9781494	1.0000000000	0.2535628117
total_trans_amt	773.8638306	0.7953558166	0.2016726572
total_ct_chng_q4_q1	567.8966675	0.5836684697	0.1479966183
total_revolving_bal	500.5438232	0.5144450814	0.1304441413
total_relationship_count	341.5622864	0.3510482600	0.0890127838
total_amt_chng_q4_q1	175.8822327	0.1807668885	0.0458357605
customer_age	160.4104614	0.1648654305	0.0418037421
avg_utilization_ratio	132.2019196	0.1358734722	0.0344524597
credit_limit	72.8332520	0.0748559996	0.0189806977
months_inactive_12_mon	70.7272568	0.0726915161	0.0184318652
gender_M	37.5323524	0.0385747126	0.0097811126
income_category_X.80K....120K	6.2846680	0.0064592077	0.0016378149
income_category_X.40K....60K	6.1469312	0.0063176456	0.0016019200
education_level_High.School	3.8736489	0.0039812291	0.0010094916
education_level_Graduate	3.1741629	0.0032623167	0.0008272022
education_level_Unknown	2.5144935	0.0025843268	0.0006552892
income_category_Less.than..40K	1.8054181	0.0018555588	0.0004705007
education_level_Uneducated	1.7907501	0.0018404834	0.0004666781
income_category_Unknown	1.4599973	0.0015005448	0.0003804824
income_category_X.60K....80K	1.4241761	0.0014637288	0.0003711472
education_level_Post.Graduate	1.4217629	0.0014612486	0.0003705183
education_level_Doctorate	0.8990833	0.0009240529	0.0002343054

h2o.varimp_plot(top_individual_model)

If we were interested in the hyperparameters of this individual GB model, we could look at them like this:

top_individual_model@parameters
## $model_id
## [1] "GBM_2_AutoML_20210725_011103"
## 
## $training_frame
## [1] "automl_training_baked_dataset_tbl_sid_914f_1"
## 
## $nfolds
## [1] 5
## 
## $keep_cross_validation_models
## [1] FALSE
## 
## $keep_cross_validation_predictions
## [1] TRUE
## 
## $score_tree_interval
## [1] 5
## 
## $fold_assignment
## [1] "Modulo"
## 
## $ntrees
## [1] 101
## 
## $max_depth
## [1] 7
## 
## $stopping_metric
## [1] "logloss"
## 
## $stopping_tolerance
## [1] 0.01187786
## 
## $seed
## [1] 7252024
## 
## $distribution
## [1] "bernoulli"
## 
## $sample_rate
## [1] 0.8
## 
## $col_sample_rate
## [1] 0.8
## 
## $col_sample_rate_per_tree
## [1] 0.8
## 
## $histogram_type
## [1] "UniformAdaptive"
## 
## $categorical_encoding
## [1] "Enum"
## 
## $x
##  [1] "customer_age"                   "total_relationship_count"      
##  [3] "months_inactive_12_mon"         "credit_limit"                  
##  [5] "total_revolving_bal"            "total_amt_chng_q4_q1"          
##  [7] "total_trans_amt"                "total_trans_ct"                
##  [9] "total_ct_chng_q4_q1"            "avg_utilization_ratio"         
## [11] "gender_M"                       "education_level_Doctorate"     
## [13] "education_level_Graduate"       "education_level_High.School"   
## [15] "education_level_Post.Graduate"  "education_level_Uneducated"    
## [17] "education_level_Unknown"        "income_category_X.40K....60K"  
## [19] "income_category_X.60K....80K"   "income_category_X.80K....120K" 
## [21] "income_category_Less.than..40K" "income_category_Unknown"       
## 
## $y
## [1] "churned"

Final Thoughts

My takeaway is that h2o autoML is really quite powerful for purely predictive tasks and is even feasible for a timed competition like Sliced. I look forward to the upcoming playoffs and will be interested to try h2o on some new datasets to see if we just got lucky here, or if it’s really that good.

TLDR

AutoML wouldn’t have won this competition (at least with the minimal feature engineering I did), but it sure got way closer than I expected!

Thank you for your attention!

Session Info

sessionInfo()
## R version 4.0.3 (2020-10-10)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 18363)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_United States.1252 
## [2] LC_CTYPE=English_United States.1252   
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] gt_0.2.2           here_1.0.0         h2o_3.32.1.3       forcats_0.5.0     
##  [5] stringr_1.4.0      readr_1.4.0        tidyverse_1.3.0    yardstick_0.0.8   
##  [9] workflowsets_0.0.2 workflows_0.2.3    tune_0.1.5         tidyr_1.1.3       
## [13] tibble_3.1.2       rsample_0.1.0      recipes_0.1.16     purrr_0.3.4       
## [17] parsnip_0.1.6.9000 modeldata_0.1.0    infer_0.5.4        ggplot2_3.3.5     
## [21] dplyr_1.0.7        dials_0.0.9        scales_1.1.1       broom_0.7.8       
## [25] tidymodels_0.1.3  
## 
## loaded via a namespace (and not attached):
##  [1] colorspace_2.0-0   ellipsis_0.3.2     class_7.3-17       rprojroot_2.0.2   
##  [5] fs_1.5.0           rstudioapi_0.13    listenv_0.8.0      furrr_0.2.1       
##  [9] bit64_4.0.5        prodlim_2019.11.13 fansi_0.5.0        lubridate_1.7.9.2 
## [13] xml2_1.3.2         codetools_0.2-18   splines_4.0.3      knitr_1.30        
## [17] jsonlite_1.7.1     pROC_1.16.2        dbplyr_2.0.0       compiler_4.0.3    
## [21] httr_1.4.2         backports_1.2.0    assertthat_0.2.1   Matrix_1.2-18     
## [25] cli_3.0.1          htmltools_0.5.0    tools_4.0.3        gtable_0.3.0      
## [29] glue_1.4.2         Rcpp_1.0.5         cellranger_1.1.0   DiceDesign_1.8-1  
## [33] vctrs_0.3.8        blogdown_0.15      iterators_1.0.13   timeDate_3043.102 
## [37] gower_0.2.2        xfun_0.19          globals_0.14.0     rvest_0.3.6       
## [41] lifecycle_1.0.0    future_1.20.1      MASS_7.3-53        ipred_0.9-9       
## [45] hms_0.5.3          parallel_4.0.3     yaml_2.2.1         sass_0.3.1        
## [49] rpart_4.1-15       stringi_1.5.3      foreach_1.5.1      checkmate_2.0.0   
## [53] lhs_1.1.1          hardhat_0.1.6      lava_1.6.8.1       rlang_0.4.11      
## [57] pkgconfig_2.0.3    bitops_1.0-6       evaluate_0.14      lattice_0.20-41   
## [61] bit_4.0.4          tidyselect_1.1.1   parallelly_1.21.0  plyr_1.8.6        
## [65] magrittr_2.0.1     bookdown_0.21      R6_2.5.0           generics_0.1.0    
## [69] DBI_1.1.0          pillar_1.6.1       haven_2.3.1        withr_2.3.0       
## [73] survival_3.2-7     RCurl_1.98-1.2     nnet_7.3-14        modelr_0.1.8      
## [77] crayon_1.4.1       utf8_1.2.1         rmarkdown_2.5      grid_4.0.3        
## [81] readxl_1.3.1       data.table_1.14.0  reprex_0.3.0       digest_0.6.27     
## [85] GPfit_1.0-8        munsell_0.5.0

Appendix - Full Details of the Ensemble Model

str(metalearner_model@model)
## List of 35
##  $ names                                           : chr [1:37] "GBM_2_AutoML_20210725_011103" "GBM_grid__1_AutoML_20210725_011103_model_2" "GBM_4_AutoML_20210725_011103" "GBM_grid__1_AutoML_20210725_011103_model_11" ...
##  $ original_names                                  : NULL
##  $ column_types                                    : chr [1:37] "Numeric" "Numeric" "Numeric" "Numeric" ...
##  $ domains                                         :List of 37
##   ..$ : NULL
##   ..$ : NULL
##   ..$ : NULL
##   ..$ : NULL
##   ..$ : NULL
##   ..$ : NULL
##   ..$ : NULL
##   ..$ : NULL
##   ..$ : NULL
##   ..$ : NULL
##   ..$ : NULL
##   ..$ : NULL
##   ..$ : NULL
##   ..$ : NULL
##   ..$ : NULL
##   ..$ : NULL
##   ..$ : NULL
##   ..$ : NULL
##   ..$ : NULL
##   ..$ : NULL
##   ..$ : NULL
##   ..$ : NULL
##   ..$ : NULL
##   ..$ : NULL
##   ..$ : NULL
##   ..$ : NULL
##   ..$ : NULL
##   ..$ : NULL
##   ..$ : NULL
##   ..$ : NULL
##   ..$ : NULL
##   ..$ : NULL
##   ..$ : NULL
##   ..$ : NULL
##   ..$ : NULL
##   ..$ : NULL
##   ..$ : chr [1:2] "no" "yes"
##  $ cross_validation_models                         : NULL
##  $ cross_validation_predictions                    : NULL
##  $ cross_validation_holdout_predictions_frame_id   : NULL
##  $ cross_validation_fold_assignment_frame_id       : NULL
##  $ model_summary                                   :Classes 'H2OTable' and 'data.frame': 1 obs. of  7 variables:
##   ..$ family                     : chr "binomial"
##   ..$ link                       : chr "logit"
##   ..$ regularization             : chr "Elastic Net (alpha = 0.5, lambda = 0.00311 )"
##   ..$ number_of_predictors_total : int 36
##   ..$ number_of_active_predictors: int 12
##   ..$ number_of_iterations       : int 7
##   ..$ training_frame             : chr "levelone_training_StackedEnsemble_AllModels_AutoML_20210725_011103"
##   ..- attr(*, "header")= chr "GLM Model"
##   ..- attr(*, "formats")= chr [1:7] "%s" "%s" "%s" "%d" ...
##   ..- attr(*, "description")= chr "summary"
##  $ scoring_history                                 :Classes 'H2OTable' and 'data.frame': 2 obs. of  17 variables:
##   ..$ timestamp                    : chr [1:2] "2021-07-25 01:26:33" "2021-07-25 01:26:33"
##   ..$ duration                     : chr [1:2] " 0.000 sec" " 0.172 sec"
##   ..$ iterations                   : int [1:2] 5 7
##   ..$ negative_log_likelihood      : num [1:2] 571 571
##   ..$ objective                    : num [1:2] 0.0822 0.0837
##   ..$ alpha                        : num [1:2] 0.5 1
##   ..$ lambda                       : num [1:2] 0.00311 0.00311
##   ..$ deviance_train               : num [1:2] 0.161 0.161
##   ..$ deviance_xval                : num [1:2] 0.166 0.166
##   ..$ deviance_se                  : num [1:2] 0.075 0.0751
##   ..$ training_rmse                : num [1:2] 0.152 0.152
##   ..$ training_logloss             : num [1:2] 0.0806 0.0805
##   ..$ training_r2                  : num [1:2] 0.828 0.828
##   ..$ training_auc                 : num [1:2] 0.992 0.992
##   ..$ training_pr_auc              : num [1:2] 0.966 0.965
##   ..$ training_lift                : num [1:2] 6.26 6.26
##   ..$ training_classification_error: num [1:2] 0.0295 0.0298
##   ..- attr(*, "header")= chr "Scoring History"
##   ..- attr(*, "formats")= chr [1:17] "%s" "%s" "%d" "%.5f" ...
##   ..- attr(*, "description")= chr ""
##  $ cv_scoring_history                              :List of 5
##   ..$ :Classes 'H2OTable' and 'data.frame':  2 obs. of  23 variables:
##   .. ..$ timestamp                      : chr [1:2] "2021-07-25 01:26:31" "2021-07-25 01:26:31"
##   .. ..$ duration                       : chr [1:2] " 0.000 sec" " 0.205 sec"
##   .. ..$ iterations                     : int [1:2] 5 7
##   .. ..$ negative_log_likelihood        : num [1:2] 473 473
##   .. ..$ objective                      : num [1:2] 0.0684 0.0698
##   .. ..$ alpha                          : num [1:2] 0.5 1
##   .. ..$ lambda                         : num [1:2] 0.00311 0.00311
##   .. ..$ deviance_train                 : num [1:2] 0.166 0.166
##   .. ..$ deviance_test                  : num [1:2] 0.144 0.144
##   .. ..$ training_rmse                  : num [1:2] 0.154 0.154
##   .. ..$ training_logloss               : num [1:2] 0.083 0.083
##   .. ..$ training_r2                    : num [1:2] 0.826 0.826
##   .. ..$ training_auc                   : num [1:2] 0.992 0.992
##   .. ..$ training_pr_auc                : num [1:2] 0.964 0.964
##   .. ..$ training_lift                  : num [1:2] 6.16 6.16
##   .. ..$ training_classification_error  : num [1:2] 0.0297 0.0295
##   .. ..$ validation_rmse                : num [1:2] 0.147 0.147
##   .. ..$ validation_logloss             : num [1:2] 0.072 0.0721
##   .. ..$ validation_r2                  : num [1:2] 0.83 0.829
##   .. ..$ validation_auc                 : num [1:2] 0.994 0.994
##   .. ..$ validation_pr_auc              : num [1:2] 0.97 0.97
##   .. ..$ validation_lift                : num [1:2] 6.71 6.71
##   .. ..$ validation_classification_error: num [1:2] 0.0288 0.0295
##   .. ..- attr(*, "header")= chr "Scoring History"
##   .. ..- attr(*, "formats")= chr [1:23] "%s" "%s" "%d" "%.5f" ...
##   .. ..- attr(*, "description")= chr ""
##   ..$ :Classes 'H2OTable' and 'data.frame':  2 obs. of  23 variables:
##   .. ..$ timestamp                      : chr [1:2] "2021-07-25 01:26:31" "2021-07-25 01:26:32"
##   .. ..$ duration                       : chr [1:2] " 0.000 sec" " 0.192 sec"
##   .. ..$ iterations                     : int [1:2] 5 7
##   .. ..$ negative_log_likelihood        : num [1:2] 448 448
##   .. ..$ objective                      : num [1:2] 0.065 0.0664
##   .. ..$ alpha                          : num [1:2] 0.5 1
##   .. ..$ lambda                         : num [1:2] 0.00311 0.00311
##   .. ..$ deviance_train                 : num [1:2] 0.157 0.157
##   .. ..$ deviance_test                  : num [1:2] 0.179 0.179
##   .. ..$ training_rmse                  : num [1:2] 0.151 0.151
##   .. ..$ training_logloss               : num [1:2] 0.0785 0.0785
##   .. ..$ training_r2                    : num [1:2] 0.831 0.831
##   .. ..$ training_auc                   : num [1:2] 0.993 0.993
##   .. ..$ training_pr_auc                : num [1:2] 0.968 0.968
##   .. ..$ training_lift                  : num [1:2] 6.21 6.21
##   .. ..$ training_classification_error  : num [1:2] 0.0293 0.0293
##   .. ..$ validation_rmse                : num [1:2] 0.157 0.157
##   .. ..$ validation_logloss             : num [1:2] 0.0894 0.0894
##   .. ..$ validation_r2                  : num [1:2] 0.811 0.812
##   .. ..$ validation_auc                 : num [1:2] 0.99 0.99
##   .. ..$ validation_pr_auc              : num [1:2] 0.954 0.954
##   .. ..$ validation_lift                : num [1:2] 6.5 6.5
##   .. ..$ validation_classification_error: num [1:2] 0.0305 0.0305
##   .. ..- attr(*, "header")= chr "Scoring History"
##   .. ..- attr(*, "formats")= chr [1:23] "%s" "%s" "%d" "%.5f" ...
##   .. ..- attr(*, "description")= chr ""
##   ..$ :Classes 'H2OTable' and 'data.frame':  2 obs. of  23 variables:
##   .. ..$ timestamp                      : chr [1:2] "2021-07-25 01:26:32" "2021-07-25 01:26:32"
##   .. ..$ duration                       : chr [1:2] " 0.000 sec" " 0.208 sec"
##   .. ..$ iterations                     : int [1:2] 5 7
##   .. ..$ negative_log_likelihood        : num [1:2] 440 440
##   .. ..$ objective                      : num [1:2] 0.0638 0.0653
##   .. ..$ alpha                          : num [1:2] 0.5 1
##   .. ..$ lambda                         : num [1:2] 0.00311 0.00311
##   .. ..$ deviance_train                 : num [1:2] 0.155 0.155
##   .. ..$ deviance_test                  : num [1:2] 0.19 0.191
##   .. ..$ training_rmse                  : num [1:2] 0.149 0.149
##   .. ..$ training_logloss               : num [1:2] 0.0774 0.0773
##   .. ..$ training_r2                    : num [1:2] 0.832 0.832
##   .. ..$ training_auc                   : num [1:2] 0.993 0.993
##   .. ..$ training_pr_auc                : num [1:2] 0.967 0.967
##   .. ..$ training_lift                  : num [1:2] 6.37 6.37
##   .. ..$ training_classification_error  : num [1:2] 0.0294 0.0294
##   .. ..$ validation_rmse                : num [1:2] 0.166 0.166
##   .. ..$ validation_logloss             : num [1:2] 0.0948 0.0953
##   .. ..$ validation_r2                  : num [1:2] 0.806 0.805
##   .. ..$ validation_auc                 : num [1:2] 0.99 0.99
##   .. ..$ validation_pr_auc              : num [1:2] 0.959 0.959
##   .. ..$ validation_lift                : num [1:2] 5.86 5.86
##   .. ..$ validation_classification_error: num [1:2] 0.0314 0.0321
##   .. ..- attr(*, "header")= chr "Scoring History"
##   .. ..- attr(*, "formats")= chr [1:23] "%s" "%s" "%d" "%.5f" ...
##   .. ..- attr(*, "description")= chr ""
##   ..$ :Classes 'H2OTable' and 'data.frame':  2 obs. of  23 variables:
##   .. ..$ timestamp                      : chr [1:2] "2021-07-25 01:26:32" "2021-07-25 01:26:32"
##   .. ..$ duration                       : chr [1:2] " 0.000 sec" " 0.173 sec"
##   .. ..$ iterations                     : int [1:2] 5 7
##   .. ..$ negative_log_likelihood        : num [1:2] 470 470
##   .. ..$ objective                      : num [1:2] 0.0679 0.0694
##   .. ..$ alpha                          : num [1:2] 0.5 1
##   .. ..$ lambda                         : num [1:2] 0.00311 0.00311
##   .. ..$ deviance_train                 : num [1:2] 0.168 0.169
##   .. ..$ deviance_test                  : num [1:2] 0.137 0.136
##   .. ..$ training_rmse                  : num [1:2] 0.156 0.156
##   .. ..$ training_logloss               : num [1:2] 0.0842 0.0842
##   .. ..$ training_r2                    : num [1:2] 0.82 0.82
##   .. ..$ training_auc                   : num [1:2] 0.992 0.992
##   .. ..$ training_pr_auc                : num [1:2] 0.963 0.963
##   .. ..$ training_lift                  : num [1:2] 6.17 6.17
##   .. ..$ training_classification_error  : num [1:2] 0.0317 0.0314
##   .. ..$ validation_rmse                : num [1:2] 0.136 0.136
##   .. ..$ validation_logloss             : num [1:2] 0.0683 0.0682
##   .. ..$ validation_r2                  : num [1:2] 0.856 0.856
##   .. ..$ validation_auc                 : num [1:2] 0.994 0.994
##   .. ..$ validation_pr_auc              : num [1:2] 0.972 0.972
##   .. ..$ validation_lift                : num [1:2] 6.61 6.61
##   .. ..$ validation_classification_error: num [1:2] 0.0206 0.0206
##   .. ..- attr(*, "header")= chr "Scoring History"
##   .. ..- attr(*, "formats")= chr [1:23] "%s" "%s" "%d" "%.5f" ...
##   .. ..- attr(*, "description")= chr ""
##   ..$ :Classes 'H2OTable' and 'data.frame':  2 obs. of  23 variables:
##   .. ..$ timestamp                      : chr [1:2] "2021-07-25 01:26:32" "2021-07-25 01:26:33"
##   .. ..$ duration                       : chr [1:2] " 0.000 sec" " 0.179 sec"
##   .. ..$ iterations                     : int [1:2] 5 7
##   .. ..$ negative_log_likelihood        : num [1:2] 445 445
##   .. ..$ objective                      : num [1:2] 0.0645 0.0659
##   .. ..$ alpha                          : num [1:2] 0.5 1
##   .. ..$ lambda                         : num [1:2] 0.00311 0.00311
##   .. ..$ deviance_train                 : num [1:2] 0.157 0.157
##   .. ..$ deviance_test                  : num [1:2] 0.181 0.181
##   .. ..$ training_rmse                  : num [1:2] 0.149 0.149
##   .. ..$ training_logloss               : num [1:2] 0.0784 0.0784
##   .. ..$ training_r2                    : num [1:2] 0.832 0.832
##   .. ..$ training_auc                   : num [1:2] 0.992 0.992
##   .. ..$ training_pr_auc                : num [1:2] 0.966 0.966
##   .. ..$ training_lift                  : num [1:2] 6.41 6.41
##   .. ..$ training_classification_error  : num [1:2] 0.0275 0.0275
##   .. ..$ validation_rmse                : num [1:2] 0.165 0.165
##   .. ..$ validation_logloss             : num [1:2] 0.0905 0.0903
##   .. ..$ validation_r2                  : num [1:2] 0.811 0.811
##   .. ..$ validation_auc                 : num [1:2] 0.991 0.992
##   .. ..$ validation_pr_auc              : num [1:2] 0.964 0.964
##   .. ..$ validation_lift                : num [1:2] 5.74 5.74
##   .. ..$ validation_classification_error: num [1:2] 0.0361 0.0369
##   .. ..- attr(*, "header")= chr "Scoring History"
##   .. ..- attr(*, "formats")= chr [1:23] "%s" "%s" "%d" "%.5f" ...
##   .. ..- attr(*, "description")= chr ""
##  $ reproducibility_information_table               :List of 3
##   ..$ :Classes 'H2OTable' and 'data.frame':  1 obs. of  26 variables:
##   .. ..$ node                 : int 0
##   .. ..$ h2o                  : chr "127.0.0.1:54321"
##   .. ..$ healthy              : chr "true"
##   .. ..$ last_ping            : chr "1627201591256"
##   .. ..$ num_cpus             : int 4
##   .. ..$ sys_load             : num 0.564
##   .. ..$ mem_value_size       : num 70732923
##   .. ..$ free_mem             : num 7.76e+08
##   .. ..$ pojo_mem             : num 1.91e+08
##   .. ..$ swap_mem             : num 0
##   .. ..$ free_disc            : num 3.6e+10
##   .. ..$ max_disc             : num 2.55e+11
##   .. ..$ pid                  : int 7916
##   .. ..$ num_keys             : int 9133
##   .. ..$ tcps_active          : chr ""
##   .. ..$ open_fds             : int -1
##   .. ..$ rpcs_active          : chr ""
##   .. ..$ nthreads             : int 4
##   .. ..$ is_leader            : chr "true"
##   .. ..$ total_mem            : num 6.27e+08
##   .. ..$ max_mem              : num 1.04e+09
##   .. ..$ java_version         : chr "Java 1.8.0_77 (from Oracle Corporation)"
##   .. ..$ jvm_launch_parameters: chr "[-Xmx1g, -ea]"
##   .. ..$ os_version           : chr "Windows 10 10.0 (x86)"
##   .. ..$ machine_physical_mem : num 1.7e+10
##   .. ..$ machine_locale       : chr "en_US"
##   .. ..- attr(*, "header")= chr "Node Information"
##   .. ..- attr(*, "formats")= chr [1:26] "%d" "%s" "%s" "%s" ...
##   .. ..- attr(*, "description")= chr ""
##   ..$ :Classes 'H2OTable' and 'data.frame':  1 obs. of  13 variables:
##   .. ..$ h2o_cluster_uptime       : num 1394453
##   .. ..$ h2o_cluster_timezone     : chr "America/Los_Angeles"
##   .. ..$ h2o_data_parsing_timezone: chr "UTC"
##   .. ..$ h2o_cluster_version      : chr "3.32.1.3"
##   .. ..$ h2o_cluster_version_age  : chr "2 months and 5 days"
##   .. ..$ h2o_cluster_name         : chr "H2O_started_from_R_kingr17_afp345"
##   .. ..$ h2o_cluster_total_nodes  : int 1
##   .. ..$ h2o_cluster_free_memory  : num 7.76e+08
##   .. ..$ h2o_cluster_total_cores  : int 4
##   .. ..$ h2o_cluster_allowed_cores: int 4
##   .. ..$ h2o_cluster_status       : chr "locked, healthly"
##   .. ..$ h2o_internal_security    : chr "false"
##   .. ..$ h2o_api_extensions       : chr "Amazon S3, Algos, AutoML, Core V3, TargetEncoder, Core V4"
##   .. ..- attr(*, "header")= chr "Cluster Configuration"
##   .. ..- attr(*, "formats")= chr [1:13] "%d" "%s" "%s" "%s" ...
##   .. ..- attr(*, "description")= chr ""
##   ..$ :Classes 'H2OTable' and 'data.frame':  2 obs. of  3 variables:
##   .. ..$ input_frame: chr [1:2] "training_frame" "validation_frame"
##   .. ..$ checksum   : num [1:2] 3.75e+18 -1.00
##   .. ..$ espc       : chr [1:2] "[0, 1772, 3544, 5316, 7088]" "-1"
##   .. ..- attr(*, "header")= chr "Input Frames Information"
##   .. ..- attr(*, "formats")= chr [1:3] "%s" "%d" "%d"
##   .. ..- attr(*, "description")= chr ""
##  $ training_metrics                                :Formal class 'H2OBinomialMetrics' [package "h2o"] with 5 slots
##   .. ..@ algorithm: chr "glm"
##   .. ..@ on_train : logi TRUE
##   .. ..@ on_valid : logi FALSE
##   .. ..@ on_xval  : logi FALSE
##   .. ..@ metrics  :List of 30
##   .. .. ..$ __meta                        :List of 3
##   .. .. .. ..$ schema_version: int 3
##   .. .. .. ..$ schema_name   : chr "ModelMetricsBinomialGLMV3"
##   .. .. .. ..$ schema_type   : chr "ModelMetricsBinomialGLM"
##   .. .. ..$ model                         :List of 4
##   .. .. .. ..$ __meta:List of 3
##   .. .. .. .. ..$ schema_version: int 3
##   .. .. .. .. ..$ schema_name   : chr "ModelKeyV3"
##   .. .. .. .. ..$ schema_type   : chr "Key<Model>"
##   .. .. .. ..$ name  : chr "metalearner_AUTO_StackedEnsemble_AllModels_AutoML_20210725_011103"
##   .. .. .. ..$ type  : chr "Key<Model>"
##   .. .. .. ..$ URL   : chr "/3/Models/metalearner_AUTO_StackedEnsemble_AllModels_AutoML_20210725_011103"
##   .. .. ..$ model_checksum                : chr "1672530778380092240"
##   .. .. ..$ frame                         :List of 4
##   .. .. .. ..$ __meta:List of 3
##   .. .. .. .. ..$ schema_version: int 3
##   .. .. .. .. ..$ schema_name   : chr "FrameKeyV3"
##   .. .. .. .. ..$ schema_type   : chr "Key<Frame>"
##   .. .. .. ..$ name  : chr "levelone_training_StackedEnsemble_AllModels_AutoML_20210725_011103"
##   .. .. .. ..$ type  : chr "Key<Frame>"
##   .. .. .. ..$ URL   : chr "...

To leave a comment for the author, please follow the link and comment on their blog: [R]eliability.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.