Three months have passed since the initial release of
greyboxon CRAN. I would not say that the package develops like crazy, but there have been some changes since May. Let’s have a look. We start by loading both
First of all,
ro()function now has its own class and works with
plot()function, so that you can have a visual representation of the results. Here’s an example:
x <- rnorm(100,100,10) ourCall <- "es(data, h=h, intervals=TRUE)" ourValue <- c("forecast", "lower", "upper") ourRO <- ro(x,h=20,origins=5,ourCall,ourValue,co=TRUE) plot(ourRO)
Each point on the produced graph corresponds to an origin and straight lines correspond to the forecasts. Given that we asked for point forecasts and for lower and upper bounds of prediction interval, we have three respective lines. By plotting the results of rolling origin experiment, we can see if the model is stable or not. Just compare the previous graph with the one produced from the call to Holt’s model:
ourCall <- "es(data, model='AAN', h=h, intervals=TRUE)" ourRO <- ro(x,h=20,origins=5,ourCall,ourValue,co=TRUE) plot(ourRO)
Holt’s model is not suitable for this time series, so it’s forecasts are less stable than the forecasts of the automatically selected model in the previous case (which is ETS(A,N,N)).
Once again, there is a vignette with examples for the
ro()function, have a look if you want to know more.
ALM – Advanced Linear Model
Yes, there is “Generalised Linear Model” in R, which implements Poisson, Gamma, Binomial and other regressions. Yes, there are smaller packages, implementing models with more exotic distributions. But I needed several regression models with: Laplace distribution, Folded normal distribution, Chi-squared distribution and one new mysterious distribution, which is currently called “S distribution”. I needed them in one place and in one format: properly estimated using likelihoods, returning confidence intervals, information criteria and being able to produce forecasts. I also wanted them to work similar to
lm(), so that the learning curve would not be too steep. So, here it is, the function
alm(). It works quite similar to
xreg <- cbind(rfnorm(100,1,10),rnorm(100,50,5)) xreg <- cbind(100+0.5*xreg[,1]-0.75*xreg[,2]+rlaplace(100,0,3),xreg,rnorm(100,300,10)) colnames(xreg) <- c("y","x1","x2","Noise") inSample <- xreg[1:80,] outSample <- xreg[-c(1:80),] ourModel <- alm(y~x1+x2, inSample, distribution="laplace") summary(ourModel)
Here’s the output of the summary:
Distribution used in the estimation: Laplace Coefficients: Estimate Std. Error Lower 2.5% Upper 97.5% (Intercept) 95.85207 0.36746 95.12022 96.58392 x1 0.59618 0.02479 0.54681 0.64554 x2 -0.67865 0.00622 -0.69103 -0.66626 ICs: AIC AICc BIC BICc 474.2453 474.7786 483.7734 484.9419
And here’s the respective plot of the forecast:
The thing that is currently missing in the function is prediction intervals, but this will be added in the upcoming releases.
Having the likelihood approach, allows comparing different models with different distributions using information criteria. Here’s, for example, what model we get if we assume S-distribution (which has fatter tails than Laplace):
summary(alm(y~x1+x2, inSample, distribution="s")) Distribution used in the estimation: S Coefficients: Estimate Std. Error Lower 2.5% Upper 97.5% (Intercept) 95.61244 0.23386 95.14666 96.07821 x1 0.56144 0.00721 0.54708 0.57581 x2 -0.66867 0.00302 -0.67470 -0.66265 ICs: AIC AICc BIC BICc 482.9358 483.4692 492.4639 493.6325
As you see, the information criteria for S distribution are higher than for Laplace, so we can conclude that the previous model was better than the second in terms of ICs.
Note that at this moment the AICc and BICc are not correct for non-normal models (at least the derivation of them needs to be double checked, which I haven’t done yet), so don’t rely on them too much.
I intent to add several other distributions that either are not available in R or are implemented unsatisfactory (from my point of view) – the function is written in a quite flexible way, so this should not be difficult to do. If you have any preferences, please add them on github, here.
I also want to implement the mixture distributions, so that things discussed in the paper on intermittent state-space model can also be implemented using pure regression.
Finally, now that I have alm, we can select between the regression models with different distributions (with
stepwise()function) or even combine them using AIC weights (hello,
lmCombine()!). Yes, I know that it sounds crazy (think of the pool of models in this case), but this should be fun!
Regression for Multiple Comparison
One of the reasons why I needed
alm()is because I wanted to implement a parametric analogue of
nemenyi()test. This should be handy, when you have large samples (because the asymptotic properties start kicking in) and it should be more powerful than non-parametric tests. So, long story short, I implemented the test in
rmc()function. It allows, for example, comparing different forecasting methods based on some known error measures. For example, if we can assume that the forecast error is distributed normally, then we can use this test in order to see if the forecasts are biased or not and how they compare with each other. On large samples it should be enough at least to have symmetric distribution.
The test is based on the simple linear model with dummy variables for each provided method (1 if the error corresponds to the method and 0 otherwise). Here’s an example of how this thing works:
ourData <- cbind(rnorm(100,0,10), rnorm(100,-2,5), rnorm(100,2,6), rlaplace(100,1,5)) colnames(ourData) <- c("Method A","Method B","Method C","Method D") rmc(ourData, distribution="norm", level=0.95)
By default the function produces graph in the MCB (Multiple Comparison with the Best) style:
Don’t forget that we deal here with errors, so closer to zero is better (Method A). Judging by the graph, we can conclude that the least biased method is A, and that the method B is negatively biased (the bounds lie below zero).
If we compare the results of the test with the mean values, we will see similar ranking:
Method A Method B Method C Method D -0.5523353 -2.4700249 1.8401501 1.1383028
This also reflects how the data was generated. Notice that Method D was generated from Laplace distribution with mean 1, but the test managed to give the correct answer in this situation, because Laplace distribution is symmetric and the sample size is large enough. But the main point of the test is that we can get the confidence intervals for each parameter, so we can see if the differences between the methods are significant: if the intervals intersect, then they are not.
The function also calculates the relative importance of the model (based on AIC), comparing it with the simple model of the type y = const + e. In my example, the constructed model explains the variance quite well, so the importance is 0.999
Similarly, we can compare absolute errors, but we would need to use Folded Normal distribution in this case:
rmc(abs(ourData), distribution="fnorm", level=0.95)
Once again, just to compare and make sense of the plot, here are the mean absolute values for the methods:
Method A Method B Method C Method D 7.703774 4.569836 4.840496 5.569483
In order to compare the different methods using squared errors, we may use Chi-Squared distribution, but some preliminary transformations are in order. Specifically, there is only the basic Chi-Squared distribution implemented in RMC, so we need to centre our data (noncentral Chi-Squared distribution would be the next thing to implement):
newData <- ourData - matrix(apply(ourData,2,mean),nrow(ourData),ncol(ourData),byrow=TRUE)
This way we remove bias and in a way do the estimation of accuracy only. In order to get to the conventional Chi-Squared distribution, we would also need to standardise the data (divide by standard deviation), but this will make everything similar (in our case – several Chi-Squared distributions with h=1 degree of freedom). Instead I would suggest not to do that, and use the new values as is, implying that each method has a Chi-Squared distribution with \(h \times \sigma^2 \) degrees of freedom, where \(h\) is the forecasting horizon and \(\sigma^2\) is the variance of the error of the method. Then the differences in \(\sigma^2\) between the methods will be captured by the test:
rmc(newData^2, distribution="chisq", level=0.95)
You may notice that we get the results very similar to the ones with the Folded Normal distribution. This is because the data is well behaved and does not violate the assumptions of normality.
Overall, in this example we see that although Method A is the least biased of all, it is also considered as the least accurate one due to the higher variance (the parameter for it is significantly higher than for the other methods).
We can also produce plots with vertical lines, that connect the models that are in the same group (no statistical difference, intersection of respective intervals). Here’s the example for the normal distribution:
rmc(ourData, distribution="norm", level=0.95, style="lines")
If you want to tune the plot, you can always save the model and then replot it with the desired parameters:
ourModel <- rmc(ourData, distribution="norm", level=0.95) plot(ourModel, xlab="Models", ylab="Errors")
Finally, it is worth pointing out that the regression on ranks on large samples will give the results similar to the nemenyi test:
rmc(t(apply(abs(ourData),1,rank)), distribution="norm", level=0.95)
This function is currently under development, and I decided to show it here just to give an impression of the ongoing research. I still don’t have good solutions for many popular error measures, but the idea right now is to develop the respective elements of the function.
Also, keep in mind that at the moment the function can only be applied to the errors for a specific horizon (for example, for 5 steps ahead). If you feed it the averaged multiple steps ahead forecast errors (e.g. MAE or MSE), you might get unreliable results, because the respective distribution might be complicated and poorly approximated by the existing ones. Having said that, the larger the dataset that you use is, the more consistent results you will get even with Normal distribution (say hi to the Central Limit Theorem).
Several methods have been moved from
greybox. These include:
- pointLik() – returns point Likelihoods, discussed in our research with Nikos;
- pAIC, pBIC, pAICc, pBICc – point values of respective information criteria, from the same research;
- nParam() – returns number of the estimated parameters in the model (+ variance);
- errorType() – returns the type of error used in the model (Additive / Multiplicative);
Furthermore, as you might have already noticed, I’ve implemented several distribution functions:
- Folded normal distribution;
- Laplace distribution;
- S distribution.
Finally, there is also a function, called
lmDynamic(), which uses pAIC in order to produce dynamic linear regression models. But this should be discussed separately in a separate post.
That’s it for now. See you in greybox 0.4.0!