OpenML stands for Open Machine Learning and is an
online platform, which aims at supporting collaborative machine learning
online. It is an Open Science project that allows its users to share data, code
and machine learning experiments.
At the time of writing this blog I am in Eindoven at an OpenML
workshop, where developers and scientists
meet to work on improving the project. Some of these people are R users and they (we)
are developing an R package that
communicates with the OpenML platform.
OpenML in R
The OpenML R package can list and download data sets and machine
learning tasks (prediction challenges). In R one can run algorithms on the
these data sets/tasks and
then upload the results to OpenML. After successful uploading, the website shows how well the
algorithm performs. To run the algorithm on a given task the OpenML R package
builds on the mlr package. mlr understands
what a task is and can run learners on that task. So all the OpenML package
needs to do is convert the OpenML objects to objects mlr understands and then
mlr deals with the learning.
A small case study
We want to create a little study on the OpenML
website, in which we compare different types of Support
Vector Machines. The study gets an ID assigned to it, which in our case is 27.
We use the function ksvm (with different settings of the function argument type)
from package kernlab, which is integrated in mlr (“classif.ksvm”).
For details on installing and setting up the OpenML R package please see the
guide on GitHub.
Let’s start conducting the study:
- Load the packages and list all tasks which have between 100 and 500
- Select all supervised classification tasks that do 10-fold cross-validation
and choose only one task per data set. To keep the study simple and fast to compute,
select only the first three tasks.
- Create the learners we want to compare.
- Run the learners on the three tasks.
- And finally upload the runs to OpenML. The upload function (uploadOMLRun)
returns the ID of the uploaded run object. When uploading runs that are part
of a certain study, tag it with study_ and the study ID. After uploading the runs appear
on the website and can be found using the tag or via the
- To show the results of our study, list the run evaluations and make a nice plot.
Now you can go ahead and create a bigger study using the techniques you have learned.