**R on FeelML**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

How to train and tune machine learning algorithms in a unified way?

With `mlr`

R package 😄

I am currently keen on automated machine learning, especially hyperparameter optimization. Therefore, recently I mainly focus on frameworks for training models. In this post, I will show how to train ML algorithms and tune them by a grid. I will show only basics, but `mlr`

package has more sophisticated features, I strongly encourage you to visit mlr webpage and explore all tutorials.

# Data set

We will use *BreastCancer* data set from `mlbench`

package and will perform binary classification. The aim of the model is to predict whether a cancer is benign or malignant (variable `Class`

). We remove the first column that contains the id of a patient as it is redundant for modeling. To read more about the data set, see the documentation (`?BreastCancer`

).

```
library("mlbench")
data("BreastCancer")
bc <- na.omit(BreastCancer[ ,-1])
head(bc)
```

```
## Cl.thickness Cell.size Cell.shape Marg.adhesion Epith.c.size Bare.nuclei
## 1 5 1 1 1 2 1
## 2 5 4 4 5 7 10
## 3 3 1 1 1 2 2
## 4 6 8 8 1 3 4
## 5 4 1 1 3 2 1
## 6 8 10 10 8 7 10
## Bl.cromatin Normal.nucleoli Mitoses Class
## 1 3 1 1 benign
## 2 3 2 1 benign
## 3 3 1 1 benign
## 4 3 7 1 benign
## 5 3 1 1 benign
## 6 9 7 1 malignant
```

# Installation

First of all make sure, that you have installed `mlr`

, It is on CRAN, so you can simply use `install.packages()`

function.

`install.packages("mlr")`

After installation, load `mlr`

and set seed to make results reproducible.

```
library(mlr)
set.seed(1)
```

# Modeling

## Fitting a model

First, you need to define a task. The task is a definition of a machine learning problem.

We will use `makeClassifTask()`

function because our problem is classification. For regression, it would be `makeRegrTask()`

and for clustering `makeClusterTask*()`

.

In `makeClassifTask()`

parameter `id`

define the name of the task, `data`

is the data model will be trained on and `target`

indicates the target variable.

```
classif_task = makeClassifTask(id = "bc", data = bc, target = "Class")
classif_task
```

```
## Supervised task: bc
## Type: classif
## Target: Class
## Observations: 683
## Features:
## numerics factors ordered functionals
## 0 4 5 0
## Missings: FALSE
## Has weights: FALSE
## Has blocking: FALSE
## Has coordinates: FALSE
## Classes: 2
## benign malignant
## 444 239
## Positive class: benign
```

The second step is defining a model. Please, note that we do not train a model yet. We only create an object that describes our algorithm.

`classif_lrn = makeLearner("classif.randomForest", par.vals = list(ntree = 200))`

In the example above we have created an object that defines classification random forest with 200 trees. To see hyperparameters we can simply use function `getParamSet()`

. We obtain names of hyperparameters, their ranges, and default values.

`getParamSet(classif_lrn)`

```
## Type len Def Constr Req Tunable Trafo
## ntree integer - 500 1 to Inf - TRUE -
## mtry integer - - 1 to Inf - TRUE -
## replace logical - TRUE - - TRUE -
## classwt numericvector
``` - 0 to Inf - TRUE -
## cutoff numericvector - 0 to 1 - TRUE -
## strata untyped - - - - FALSE -
## sampsize integervector - 1 to Inf - TRUE -
## nodesize integer - 1 1 to Inf - TRUE -
## maxnodes integer - - 1 to Inf - TRUE -
## importance logical - FALSE - - TRUE -
## localImp logical - FALSE - - TRUE -
## proximity logical - FALSE - - FALSE -
## oob.prox logical - - - Y FALSE -
## norm.votes logical - TRUE - - FALSE -
## do.trace logical - FALSE - - FALSE -
## keep.forest logical - TRUE - - FALSE -
## keep.inbag logical - FALSE - - FALSE -

Now, we are ready to fit a model. We can just simply use function `train()`

.

`model = train(classif_lrn, classif_task)`

## Tuning a model

To tune hyperparameters, we need to specify a space fo search. For defining space for integer parameters we use function `makeIntegerParam()`

. All of this is pinned together with the function `makeParamSet()`

.

```
params = makeParamSet(
makeIntegerParam("mtry", lower = 1, upper = 100),
makeIntegerParam("ntree", lower = 1L, upper = 500L)
)
```

Now, we use function `makeTuneControlRandom()`

to create an object that define random search. Parameter `maxit`

defines the number of iterations. Function `makeResampleDesc()`

create an object for a resampling strategy, in this case cross-validation. Finally, we can combine all of the previous pieces with function `tuneParams()`

and tune randomForest.

```
ctrl = makeTuneControlRandom(maxit = 10L)
rdesc = makeResampleDesc("CV", iters = 3L)
res = tuneParams(classif_lrn, task = classif_task,
resampling = rdesc,
par.set = params,
control = ctrl,
measures = list(acc),
show.info = FALSE)
res
```

```
## Tune result:
## Op. pars: mtry=49; ntree=464
## acc.test.mean=0.9707409
```

As a result of tuning, we have obtained hyperparameters `mtry=49`

, `ntree=464`

.

# More

If you would like to learn more about mlr, you can visit mlr webpage.

What is more, a new version of this package is coming up. I highly recommended the following information about mlr3.

**leave a comment**for the author, please follow the link and comment on their blog:

**R on FeelML**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.