greybox package for R
Want to share your content on Rbloggers? click here if you have a blog, or here if you don't.
I am delighted to announce a new package on CRAN. It is called “greybox”. I know, what my American friends will say, as soon as they see the name – they will claim that there is a typo, and that it should be “a” instead of “e”. But in fact no mistake was made – I used British spelling for the name, and I totally understand that at some point I might regret this…
So, what is “greybox”? Wikipedia tells us that grey box is a model that “combines a partial theoretical structure with data to complete the model”. This means that almost any statistical model can be considered as a grey box, thus making the package potentially quite flexible and versatile.
But why do we need a new package on CRAN?
First, there were several functions in smooth package that did not belong there, and there are several functions in TStools package that can be united with a topic of model building. They focus on the multivariate regression analysis rather than on statespace models, time series smoothing or anything else. It would make more sense to find them their own home package. An example of such a function is
ro()
– Rolling Origin – function that Yves and I wrote in 2016 on our way to the International Symposium on Forecasting. Arguably this function can be used not only for assessing the accuracy of forecasting models, but also for the variables / model selection.
Second, in one of my side projects, I needed to work more on the multivariate regressions, and I had several ideas I wanted to test. One of those is creating a combined multivariate regression from several models using information criteria weights. The existing implementations did not satisfy me, so I ended up writing a function
combiner()
that does that. In addition, our research together with Yves Sagaert indicates that there is a nice solution for a fat regression problem (when the number of parameters is higher than the number of observations) using information criteria. Uploading those function in
smooth
did not sound right, but having a
greybox
helps a lot. There are other ideas that I have in mind, and they don’t fit in the other packages.
Finally, I could not find satisfactory (from my point of view) packages on CRAN that would focus on multivariate model building and forecasting – the usual focus is on analysis instead (including time series analysis). The other thing is the obsession of many packages with pvalues and hypotheses testing, which was yet another motivator for me to develop a package that would be completely hypothesesfree (at 95% level). As a result, if you work with the functions from
greybox
, you might notice that they produce confidence intervals instead of pvalues (because I find them more informative and useful). Finally, I needed good instruments for the promotional modelling for several projects, and it was easier to implement them myself than to compile them from different functions from different packages.
Keeping that in mind, it makes sense to briefly discuss what is already available there. I’ve already discussed how
xregExpander()
and
stepwise()
functions work in one of the previous posts, and these functions are now available in
greybox
instead of
smooth
. However, I have not covered either
combiner()
or
ro()
functions yet. While
combiner()
is still under construction and works only for normal cases (fat regression can be solved, but not 100% efficiently),
ro()
has worked efficiently for several years already. So I created a detailed vignette, explaining what is rolling origin, how the function works and how to use it. So, if you are interested in finding out more, check it out on CRAN.
As a wrap up,
greybox
package is focused on model building and forecasting and from now on will be periodically updated.
As a final note, I plan to do the following in
greybox
in future releases:
 Move
nemenyi()
 Develop functions for promotional modelling;
 Write a function for multiple correlation coefficients (will be used for multicollinearity analysis);
 Implement variables selection based on rolling origin evaluation;
 Stepwise regression and combinations of models, based on Laplace and the other distributions;
 AICc for Laplace and the other distributions;
 Solve fat regression problem via combination of regression models (sounds crazy, right?);

xregTransformer
– Nonlinear transformation of the provided xreg variables;
 Other cool stuff.
If you have any thoughts on what to implement, leave a comment – I will consider your idea.
Rbloggers.com offers daily email updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/datascience job.
Want to share your content on Rbloggers? click here if you have a blog, or here if you don't.