**Econometrics and Free Software**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I have recently been confronted to a kind of data set and problem that I was not even aware existed:

intermittent demand data. Intermittent demand arises when the demand for a certain good arrives

sporadically. Let’s take a look at an example, by analyzing the number of downloads for the `{RDieHarder}`

package:

```
library(tidyverse)
library(tsintermittent)
library(nnfor)
library(cranlogs)
library(brotools)
```

```
rdieharder <- cran_downloads("RDieHarder", from = "2017-01-01")
ggplot(rdieharder) +
geom_line(aes(y = count, x = date), colour = "#82518c") +
theme_blog()
```

Let’s take a look at just one month of data, because the above plot is not very clear, because of

the outlier just before 2019… I wonder now, was that on Christmas day?

```
rdieharder %>%
filter(count == max(count))
```

```
## date count package
## 1 2018-12-21 373 RDieHarder
```

Not exactly on Christmas day, but almost! Anyways, let’s look at one month of data:

```
january_2018 <- rdieharder %>%
filter(between(date, as.Date("2018-01-01"), as.Date("2018-02-01")))
ggplot(january_2018) +
geom_line(aes(y = count, x = date), colour = "#82518c") +
theme_blog()
```

Now, it is clear that this will be tricky to forecast. There is no discernible pattern,

no trend, no seasonality… nothing that would make it “easy” for a model to learn how to forecast

such data.

This is typical intermittent demand data. Specific methods have been developed to forecast such

data, the most well-known being Croston, as detailed in

this paper.

A function to estimate such models is available in the `{tsintermittent}`

package, written by

Nikolaos Kourentzes

who also wrote another package, `{nnfor}`

, which uses Neural Networks to forecast time series data.

I am going to use both to try to forecast the intermittent demand for the `{RDieHarder}`

package

for the year 2019.

Let’s first load these packages:

```
library(tsintermittent)
library(nnfor)
```

And as usual, split the data into training and testing sets:

```
train_data <- rdieharder %>%
filter(date < as.Date("2019-01-01")) %>%
pull(count) %>%
ts()
test_data <- rdieharder %>%
filter(date >= as.Date("2019-01-01"))
```

Let’s consider three models; a naive one, which simply uses the mean of the training set as the

forecast for all future periods, Croston’s method, and finally a Neural Network from the `{nnfor}`

package:

```
naive_model <- mean(train_data)
croston_model <- crost(train_data, h = 163)
nn_model <- mlp(train_data, reps = 1, hd.auto.type = "cv")
```

```
## Warning in preprocess(y, m, lags, keep, difforder, sel.lag,
## allow.det.season, : No inputs left in the network after pre-selection,
## forcing AR(1).
```

`nn_model_forecast <- forecast(nn_model, h = 163)`

The `crost()`

function estimates Croston’s model, and the `h`

argument produces the

forecast for the next 163 days. `mlp()`

trains a multilayer perceptron, and the `hd.auto.type = "cv"`

argument means that 5-fold cross-validation will be used to find the best number of hidden nodes. I

then obtain the forecast using the `forecast()`

function. As you can read from the Warning message

above, the Neural Network was replaced by an auto-regressive model, AR(1), because no inputs were

left after pre-selection… I am not exactly sure what that means, but if I remove the big outlier

from before, this warning message disappears, and a Neural Network is successfully trained.

In order to rank the models, I follow this paper

from Rob J. Hyndman, who wrote a very useful book titled Forecasting: Principles and Practice,

and use the Mean Absolute Scaled Error, or MASE. You can also read this shorter pdf

which also details how to use MASE to measure the accuracy for intermittent demand. Here is the

function:

```
mase <- function(train_ts, test_ts, outsample_forecast){
naive_insample_forecast <- stats::lag(train_ts)
insample_mae <- mean(abs(train_ts - naive_insample_forecast), na.rm = TRUE)
error_outsample <- test_ts - outsample_forecast
ase <- error_outsample / insample_mae
mean(abs(ase), na.rm = TRUE)
}
```

It is now easy to compute the models’ accuracies:

`mase(train_data, test_data$count, naive_model)`

`## [1] 1.764385`

`mase(train_data, test_data$count, croston_model$component$c.out[1])`

`## [1] 1.397611`

`mase(train_data, test_data$count, nn_model_forecast$mean)`

`## [1] 1.767357`

Croston’s method is the one that performs best from the three. Maybe surprisingly, the naive method

performs just as well as the Neural Network! (or rather, the AR(1) model) Let’s also plot the predictions

with the true values from the test set:

```
test_data <- test_data %>%
mutate(naive_model_forecast = naive_model,
croston_model_forecast = croston_model$component$c.out[1],
nn_model_forecast = nn_model_forecast$mean) %>%
select(-package) %>%
rename(actual_value = count)
test_data_longer <- test_data %>%
gather(models, value,
actual_value, naive_model_forecast, croston_model_forecast, nn_model_forecast)
```

```
## Warning: attributes are not identical across measure variables;
## they will be dropped
```

```
ggplot(test_data_longer) +
geom_line(aes(y = value, x = date, colour = models)) +
theme_blog()
```

Just to make sure I didn’t make a mistake when writing the `mase()`

function, let’s use the

`accuracy()`

function from the `{forecast}`

package and compare the result for the Neural Network:

```
library(forecast)
accuracy(nn_model_forecast, x = test_data$actual_value)
```

```
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 0.001929409 14.81196 4.109577 NaN Inf 0.8437033 0.05425074
## Test set 8.211758227 12.40199 8.635563 -Inf Inf 1.7673570 NA
```

The result is the same, so it does seem like the naive method is not that bad, actually! Now, in

general, intermittent demand series have a lot of 0 values, which is not really the case here. I

still think that the methodology fits to this particular data set.

How else would you have forecast this data? Let me know via twitter!

Hope you enjoyed! If you found this blog post useful, you might want to follow

me on twitter for blog post updates and

buy me an espresso or paypal.me.

**leave a comment**for the author, please follow the link and comment on their blog:

**Econometrics and Free Software**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.