Use mlrMBO to optimize via command line
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Many people who want to apply Bayesian optimization want to use it to optimize an algorithm that is not implemented in R but runs on the command line as a shell script or an executable.
We recently published mlrMBO on CRAN. As a normal package it normally operates inside of R, but with this post I want to demonstrate how mlrMBO can be used to optimize an external application. At the same time I will highlight some issues you can likely run into.
First of all we need a bash script that we want to optimize.
This tutorial will only run on Unix systems (Linux, OSX etc.) but should also be informative for windows users.
The following code will write a tiny bash script that uses bc
to calculate $sin(x_1-1) + (x_1^2 + x_2^2)$ and write the result “hidden” in a sentence (The result is 12.34!
) in a result.txt
text file.
The bash script
Running the script from R
Now we need a R function that starts the script, reads the result from the text file and returns it.
This function uses stringi
and regular expressions to match the result within the sentence.
Depending on the output different strategies to read the result make sense.
XML files can usually be accessed with XML::xmlParse
, XML::getNodeSet
, XML::xmlAttrs
etc. using XPath
queries.
Sometimes the good old read.table()
is also sufficient.
If, for example, the output is written in a file like this:
You can easily use source()
like that:
which will return a list with the entries $value1
and $value2
.
Define bounds, wrap function.
To evaluate the function from within mlrMBO it has to be wrapped in smoof function. The smoof function also contains information about the bounds and scales of the domain of the objective function defined in a ParameterSet.
If you run this locally, you will see that the console output generated by our shell script directly appears in the R-console. This can be helpful but also annoying.
Redirecting output
If a lot of output is generated during a single call of system()
it might even crash R.
To avoid that I suggest to redirect the output into a file.
This way no output is lost and the R console does not get flooded.
We can simply achieve that by replacing the command
in the function runScript
from above with the following code:
Start the Optimization
Now everything is set so we can proceed with the usual MBO setup:
Execute the R script from a shell
Also you might not want to bothered having to start R and run this script manually so what I would recommend is saving all above as an R-script plus some lines that write the output in a JSON file like this:
Let’s assume we saved all of that above as an R-script under the name runMBO.R
(actually it is available as a gist).
Then you can simply run it from the command line:
As an extra the script in the gist also contains a simple handler for command line arguments. In this case you can define the number of optimization iterations and the maximal allowed time in seconds for the optimization. You can also define the seed to make runs reproducible:
If you want to build a more advanced command line interface you might want to have a look at docopt.
Clean up
To clean up all the files generated by this script you can run:
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.