# mlr: Machine Learning in R – basics

**R on FeelML**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

How to train and tune machine learning algorithms in a unified way?

With `mlr`

R package ????

I am currently keen on automated machine learning, especially hyperparameter optimization. Therefore, recently I mainly focus on frameworks for training models. In this post, I will show how to train ML algorithms and tune them by a grid. I will show only basics, but `mlr`

package has more sophisticated features, I strongly encourage you to visit mlr webpage and explore all tutorials.

# Data set

We will use *BreastCancer* data set from `mlbench`

package and will perform binary classification. The aim of the model is to predict whether a cancer is benign or malignant (variable `Class`

). We remove the first column that contains the id of a patient as it is redundant for modeling. To read more about the data set, see the documentation (`?BreastCancer`

).

library("mlbench") data("BreastCancer") bc <- na.omit(BreastCancer[ ,-1]) head(bc) ## Cl.thickness Cell.size Cell.shape Marg.adhesion Epith.c.size Bare.nuclei ## 1 5 1 1 1 2 1 ## 2 5 4 4 5 7 10 ## 3 3 1 1 1 2 2 ## 4 6 8 8 1 3 4 ## 5 4 1 1 3 2 1 ## 6 8 10 10 8 7 10 ## Bl.cromatin Normal.nucleoli Mitoses Class ## 1 3 1 1 benign ## 2 3 2 1 benign ## 3 3 1 1 benign ## 4 3 7 1 benign ## 5 3 1 1 benign ## 6 9 7 1 malignant

# Installation

First of all make sure, that you have installed `mlr`

, It is on CRAN, so you can simply use `install.packages()`

function.

install.packages("mlr")

After installation, load `mlr`

and set seed to make results reproducible.

library(mlr) set.seed(1)

# Modeling

## Fitting a model

First, you need to define a task. The task is a definition of a machine learning problem.
We will use `makeClassifTask()`

function because our problem is classification. For regression, it would be `makeRegrTask()`

and for clustering `makeClusterTask*()`

.

In `makeClassifTask()`

parameter `id`

define the name of the task, `data`

is the data model will be trained on and `target`

indicates the target variable.

classif_task = makeClassifTask(id = "bc", data = bc, target = "Class") classif_task ## Supervised task: bc ## Type: classif ## Target: Class ## Observations: 683 ## Features: ## numerics factors ordered functionals ## 0 4 5 0 ## Missings: FALSE ## Has weights: FALSE ## Has blocking: FALSE ## Has coordinates: FALSE ## Classes: 2 ## benign malignant ## 444 239 ## Positive class: benign

The second step is defining a model. Please, note that we do not train a model yet. We only create an object that describes our algorithm.

classif_lrn = makeLearner("classif.randomForest", par.vals = list(ntree = 200))

In the example above we have created an object that defines classification random forest with 200 trees. To see hyperparameters we can simply use function `getParamSet()`

. We obtain names of hyperparameters, their ranges, and default values.

getParamSet(classif_lrn) ## Type len Def Constr Req Tunable Trafo ## ntree integer - 500 1 to Inf - TRUE - ## mtry integer - - 1 to Inf - TRUE - ## replace logical - TRUE - - TRUE - ## classwt numericvector- 0 to Inf - TRUE - ## cutoff numericvector - 0 to 1 - TRUE - ## strata untyped - - - - FALSE - ## sampsize integervector - 1 to Inf - TRUE - ## nodesize integer - 1 1 to Inf - TRUE - ## maxnodes integer - - 1 to Inf - TRUE - ## importance logical - FALSE - - TRUE - ## localImp logical - FALSE - - TRUE - ## proximity logical - FALSE - - FALSE - ## oob.prox logical - - - Y FALSE - ## norm.votes logical - TRUE - - FALSE - ## do.trace logical - FALSE - - FALSE - ## keep.forest logical - TRUE - - FALSE - ## keep.inbag logical - FALSE - - FALSE -

Now, we are ready to fit a model. We can just simply use function `train()`

.

model = train(classif_lrn, classif_task)

## Tuning a model

To tune hyperparameters, we need to specify a space fo search. For defining space for integer parameters we use function `makeIntegerParam()`

. All of this is pinned together with the function `makeParamSet()`

.

params = makeParamSet( makeIntegerParam("mtry", lower = 1, upper = 100), makeIntegerParam("ntree", lower = 1L, upper = 500L) )

Now, we use function `makeTuneControlRandom()`

to create an object that define random search. Parameter `maxit`

defines the number of iterations. Function `makeResampleDesc()`

create an object for a resampling strategy, in this case cross-validation. Finally, we can combine all of the previous pieces with function `tuneParams()`

and tune randomForest.

ctrl = makeTuneControlRandom(maxit = 10L) rdesc = makeResampleDesc("CV", iters = 3L) res = tuneParams(classif_lrn, task = classif_task, resampling = rdesc, par.set = params, control = ctrl, measures = list(acc), show.info = FALSE) res ## Tune result: ## Op. pars: mtry=49; ntree=464 ## acc.test.mean=0.9707409

As a result of tuning, we have obtained hyperparameters `mtry=49`

, `ntree=464`

.

# More

If you would like to learn more about mlr, you can visit mlr webpage.

What is more, a new version of this package is coming up. I highly recommended the following information about mlr3.

**leave a comment**for the author, please follow the link and comment on their blog:

**R on FeelML**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.