The other day I run a machine learning backtest on a new data set. Once I got through the LDA and QDA initial run, I decided to try xgboost. The first thing I observed was a really bad performance. The results from the following debugging session were quite surprising to me.
I have been using the same framework for a few years now. I think there are some examples outlining the approach even on this blog, but I am lazy to dig them out now. Without going into further details, let me outline my “stack”:
As I mentioned, I have been using this stack for a few years now, and during this time, I have seen some really slow models. Two factors got me suspicious in this case:
- I was using a new method – yeah, my first attempt with xgboost.
- The data set was rather small and simple.
What I found out was that there was too much parallelization happening. Somehow, all these threads and process were getting messed up, and, although there was progress, it was glacially slow.
Looking at the stack – the parallization is not that obvious. Certainly I was using multiple processes via the parallel package, but what else – I was seeing a lot more threads running. The culprit in this case was the default parallelization in xgboost. Nowadays apparently every layer is trying to exploit multiple cores, thus, that wasn’t surprising, just something new to me.
The fix ended up being quite simple – call caret’s train with nthread=1, which in turn is passed to xgb.train and solves the problem.
Looking at the stack above, I realized that, potentially, there might be other similar issues. For instance, Microsoft’s R Open provides some multi-threaded improvements via the Intel’s MKL library. In my case, that was not causing any observable problems, but in case it is – the threading can be disabled via:
Now everything is up and running, and I am looking forward to the output.