Recently new tools for data science pop up constantly, so it is hard to keep up with the changes and choose those that promise to be useful in the long run.
Recently two new solutions were announced that look very promising: reticulate package for R and Microsoft Cognitive Toolkit 2.0 (CNTK). Together with Wit Jakuczun we have decided to test drive them on Azure Data Science Virtual Machine.
CNTK is a very promising deep learning toolkit, backed by Microsoft. Unfortunately it provides bindings to Python and not to R. Therefore it is an ideal situation to test reticulate package, which provides R interface to Python. We wanted to test CNTK on GPUs, so we have decided to use Azure Data Science Virtual Machine using NV6 Instance with Tesla M60 GPU.
For our test-drive we have used classic MNIST dataset. You can find all the source codes and detailed results of the test on GitHub. The short conclusion is that all components worked and played together excellently:
- on CNTK we have build MLP network model, which has reported 2.3% classification error, without any special tuning, which is a pretty decent result;
- calling Python library from R using reticulate was simple and stable;
- Calculations using GPU give a significant speedup and configuration of Azure DSVM was really simple.
In summary, we have found the combo MSFT CNTK + R/reticulate + Azure DSVM to be simple to set up and powerful. It if definitely worth checking in your next data science project.