# Exploring and Understanding Hyperparameter Tuning

**mlr-org**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Learners use hyperparameters to achieve better performance on particular

datasets. When we use a machine learning package to choose the best hyperparmeters,

the relationship between changing the hyperparameter and performance might not

be obvious. mlr provides several new

implementations to better understand what happens when we tune hyperparameters

and to help us optimize our choice of hyperparameters.

# Background

Let’s say you have a dataset, and you’re getting ready to flex your machine

learning muscles. Maybe you want to do classification, or regression, or

clustering. You get your dataset together and pick a few learners to evaluate.

The majority of learners that you might use for any of these tasks

have hyperparameters that the user must tune. Hyperparameters may be able to take

on a lot of possible values, so it’s typically left to the user to specify the

values. If you’re using a popular machine learning library like sci-kit learn,

the library will take care of this for you via cross-validation: auto-generating

the optimal values for your hyperparameters. We’ll then take these best-performing

hyperparameters and use those values for our learner. Essentially, we treat the

optimization of hyperparameters as a black box.

In mlr, we want to open up that black box, so

that you can make better decisions. Using the functionality built-in, we can

answer questions like:

- How does varying the value of a hyperparameter change the performance of the machine learning algorithm?
- On a related note: where’s an ideal range to search for optimal hyperparameters?
- How did the optimization algorithm (prematurely) converge?
- What’s the relative importance of each hyperparameter?

Some of the users who might see benefit from “opening” the black box of hyperparameter

optimization:

- researchers that want to better understand learners in practice
- engineers that want to maximize performance or minimize run time
- teachers that want to demonstrate what happens when tuning hyperparameters

We’ll use Pima Indians dataset in this blog post, where we want to

predict whether or not someone has diabetes, so we’ll perform classification,

but the methods we discuss also work for regression and clustering.

Perhaps we decide we want to try kernlab’s svm for our

classification task. Knowing that svm has several hyperparameters to tune, we

can ask mlr to list the hyperparameters to refresh our memory:

Noting that we have default values for each of the hyperparameters, we could

simply accept the defaults for each of the hyperparameters and evaluate our

`mmce`

performance using 3-fold cross validation:

While this result may seem decent, we have a nagging doubt: what if we chose

hyperparameter values different from the defaults? Would we get better results?

Maybe we believe that the default of `kernel = "rbfdot"`

will work well based

on our prior knowledge of the dataset, but we want to try altering our

regularization to get better performance. For kernlab’s svm, regularization

is represented using the `C`

hyperparameter. Calling `getParamSet`

again to

refresh our memory, we see that `C`

defaults to 1.

Let’s tell mlr to randomly pick `C`

values

between `2^-5`

and `2^5`

, evaluating `mmce`

using 3-fold cross validation:

mlr gives us the best performing value for `C`

,

and we can see that we’ve improved our results vs. just accepting the default

value for `C`

. This functionality is available in other machine learning packages, like

sci-kit learn’s random search, but this functionality is essentially treating our

choice of `C`

as a black box method: we give a search strategy and just accept

the optimal value. What if we wanted to get a sense of the relationship between

`C`

and `mmce`

? Maybe the relationship is linear in a certain range and we can

exploit this to get better even performance! mlr

provides 2 methods to help answer this question: `generateHyperParsEffectData`

to

generate the resulting data and `plotHyperParsEffect`

providing many options

built-in for the user to plot the data.

Let’s investigate the results from before where we tuned `C`

:

From the scatterplot, it appears our optimal performance is somewhere in the

region between `2^-2.5`

and `2^-1.75`

. This could provide us a region to further

explore if we wanted to try to get even better performance!

We could also evaluate how “long” it takes us to find that optimal value:

By default, the plot only shows the global optimum, so we can see that we found

the “best” performance in less than 25 iterations!

But wait, I hear you saying. I also want to tune `sigma`

, the inverse kernel

width of the radial basis kernel function. So now we have 2 hyperparameters that

we want to simultaneously tune: `C`

and `sigma`

.

We can use `plotHyperParsEffect`

to easily create a heatmap with both hyperparameters.

We get tons of functionality for free here. For example, mlr

will automatically interpolate the grid to get an estimate for values we didn’t

even test! All we need to do is pass a regression learner to the `interpolate`

argument:

If we use the `show.experiments`

argument, we can see which points were

actually tested and which were interpolated:

`plotHyperParsEffect`

returns a `ggplot2`

object, so we can always customize it

to better fit our needs downstream:

Now we can get a good sense of where the separation happens for each of the

hyperparameters: in this particular example, we want lower values for `sigma`

and values around 1 for `C`

.

This was just a taste of mlr’s hyperparameter tuning visualization capabilities. For the full tutorial, check out the mlr tutorial.

Some features coming soon:

- “Prettier” plot defaults
- Support for more than 2 hyperparameters
- Direct support for hyperparameter “importance”

Thanks to the generous sponsorship from GSoC, and many thanks to my mentors Bernd Bischl and Lars Kotthoff!

**leave a comment**for the author, please follow the link and comment on their blog:

**mlr-org**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.