test_batch <- as_iterator(ds_test) %>% iter_next() encoded <- encoder(test_batch[][1:1000]) test_var <- tf\(math\)reduce_variance(encoded, axis = 0L) print(test_var %>% as.numeric() %>% round(5)) } “`
Experimental setup and data
The idea was to add white noise to a deterministic series. This time, the Roessler system was chosen, mainly for the prettiness of its attractor, apparent even in its two-dimensional projections:
Like we did for the Lorenz system in the first part of this series, we use
deSolve to generate data from the Roessler equations.
Then, noise is added, to the desired degree, by drawing from a normal distribution, centered at zero, with standard deviations varying between 1 and 2.5.
Here you can compare effects of not adding any noise (left), standard deviation-1 (middle), and standard deviation-2.5 Gaussian noise:
Otherwise, preprocessing proceeds as in the previous posts. In the upcoming results section, we’ll compare forecasts not just to the “real”, after noise addition, test split of the data, but also to the underlying Roessler system – that is, the thing we’re really interested in. (Just that in the real world, we can’t do that check.) This second test set is prepared for forecasting just like the other one; to avoid duplication we don’t reproduce the code.
The LSTM used for comparison with the VAE described above is identical to the architecture employed in the previous post. While with the VAE, an
fnn_multiplier of 1 yielded sufficient regularization for all noise levels, some more experimentation was needed for the LSTM: At noise levels 2 and 2.5, that multiplier was set to 5.
As a result, in all cases, there was one latent variable with high variance and a second one of minor importance. For all others, variance was close to 0.
In all cases here means: In all cases where FNN regularization was used. As already hinted at in the introduction, the main regularizing factor providing robustness to noise here seems to be FNN loss, not KL divergence. So for all noise levels, besides FNN-regularized LSTM and VAE models we also tested their non-constrained counterparts.
Seeing how all models did superbly on the original deterministic series, a noise level of 1 can almost be treated as a baseline. Here you see sixteen 120-timestep predictions from both regularized models, FNN-VAE (dark blue), and FNN-LSTM (orange). The noisy test data, both input (
x, 120 steps) and output (
y, 120 steps) are displayed in (blue-ish) grey. In green, also spanning the whole sequence, we have the original Roessler data, the way they would look had no noise been added.
Despite the noise, forecasts from both models look excellent. Is this due to the FNN regularizer?
Looking at forecasts from their unregularized counterparts, we have to admit these do not look any worse. (For better comparability, the sixteen sequences to forecast were initiallly picked at random, but used to test all models and conditions.)
What happens when we start to add noise?
Between noise levels 1.5 and 2, something changed, or became noticeable from visual inspection. Let’s jump directly to the highest-used level though: 2.5.
Here first are predictions obtained from the unregularized models.
Both LSTM and VAE get “distracted” a bit too much by the noise, the latter to an even higher degree. This leads to cases where predictions strongly “overshoot” the underlying non-noisy rhythm. This is not surprising, of course: They were trained on the noisy version; predict fluctuations is what they learned.
Do we see the same with the FNN models?
Interestingly, we see a much better fit to the underlying Roessler system now! Especially the VAE model, FNN-VAE, surprises with a whole new smoothness of predictions; but FNN-LSTM turns up much smoother forecasts as well.
“Smooth, fitting the system…” – by now you may be wondering, when are we going to come up with more quantitative assertions? If quantitative implies “mean squared error” (MSE), and if MSE is taken to be some divergence between forecasts and the true target from the test set, the answer is that this MSE doesn’t differ much between any of the four architectures. Put differently, it is mostly a function of noise level.
However, we could argue that what we’re really interested in is how well a model forecasts the underlying process. And there, we see differences.
In the following plot, we contrast MSEs obtained for the four model types (grey: VAE; orange: LSTM; dark blue: FNN-VAE; green: FNN-LSTM). The rows reflect noise levels (1, 1.5, 2, 2.5); the columns represent MSE in relation to the noisy(“real”) target (left) on the one hand, and in relation to the underlying system on the other (right). For better visibility of the effect, MSEs have been normalized as fractions of the maximum MSE in a category.
So, if we want to predict signal plus noise (left), it is not extremely critical whether we use FNN or not. But if we want to predict the signal only (right), with increasing noise in the data FNN loss becomes increasingly effective. This effect is far stronger for VAE vs. FNN-VAE than for LSTM vs. FNN-LSTM: The distance between the grey line (VAE) and the dark blue one (FNN-VAE) becomes larger and larger as we add more noise.
Our experiments show that when noise is likely to obscure measurements from an underlying deterministic system, FNN regularization can strongly improve forecasts. This is the case especially for convolutional VAEs, and probably convolutional autoencoders in general. And if an FNN-constrained VAE performs as well, for time series prediction, as an LSTM, there is a strong incentive to use the convolutional model: It trains significantly faster.
With that, we conclude our mini-series on FNN-regularized models. As always, we’d love to hear from you if you were able to make use of this in your own work!
Thanks for reading!