Recently I have run an exam where the following question had risen many problems for students (here I give its shortened formulation). You are given the data generating process y = 10x + e, where e is error term. Fit linear regression using lm, neural net using nnet with size equal to 2 and 10 and regression tree using rpart. What can be said about distribution of prediction error of such four modeling techniques?
Here is the code that generates the required comparison assuming that x ~ U(0, 1) and e ~ N(0, 1) and two example levels of training sample size 20 and 200.
# Max. :568.374 Max. :6502 Max. :83603.10 Max. :83444.6
It is simple that linear regression is optimal as it is properly specified. Next in general neural net with size 2, neural net with size 10 and regression tree follow. The reason is that neural nets use S-shaped transformations and have effectively more parameters than are needed to fit the relationship. Finally regression tree is simply not well suited for modeling linear relationships between variables.
However, neural nets are initialized using random parameters and sometimes BFGS optimization fails and very poor fits can occur. This can be seen by large values of Max. in nnet2 and nnet10. The median of the results is largely unaffected by this but evaluation of mean expected error is very unstable due to the outliers (in order to get more reliable estimates more than 100 replications are needed).
Of course by modifying rpart or nnet one can get a bit different results but the general conclusions will be similar.