While continuing my study of neural networks and deep learning, I inevitably meet up with recurrent neural networks.
Recurrent neural networks (RNN) are a particular kind of neural networks usually very good at predicting sequences due to their inner working. If your task is to predict a sequence or a periodic signal, then using a RNN might be a good starting point. Plain vanilla RNN work fine but they have a little problem when trying to “keep in memory” events occured, say for instance, more than 20 steps back. The solution to this problem has been addressed with the development of a model called LSTM network. As far as I know, LSTM should usually be preferred to a plain vanilla RNN when possible as it yields better results.
In this post however, I am going to work on a plain vanilla RNN model. The reasons for doing this are two. First of all this is one of my first experience with RNN and I would like to get comfortable with them before going deeper; secondly, R provides a simple and very user friendly package named “rnn” for working with recurrent neural networks. I am going to dive in LSTM using MXNET and Tensorflow later.
The task I am going to address is trying to predict a cosine from a noisy sine wave. Here below you can see the plot of the predictor X sequence and the Y sequence to be predicted.
X is essentially a sine wave with some normally distributed noise, while Y is a straightforward smooth cosine wave.
You can clearly see that what I expect the model to do is to capture the phase shift of 90 degrees between the two waves and to throw away the noise in the input.
I chose to use a 5Hz frequency for both the waves but you can play around maybe trying to obtain similar results changing the frequency. Be aware though that the higher the frequency, the more datapoints you need to avoid problems that comes with the sampling theorem.
The artificial dataset I created for this task is a set of 10 sequences each of which consists of 40 observations. The X matrix contains 10 sequences of a noisy sine wave while the Y matrix contains the corresponding 10 sequences of a clear cosine wave.
Before fitting the model I standardized all the data in the $[0 – 1]$ interval. When using any neural network model with real valued data, make sure not to avoid this step because if you do, then you might spend the next hour trying to figure out why the model did not converge or spitted out weird results. I am not an expert but I know from personal painful experience that this step is usually crucial, nevertheless I may occasionally forget to do it and then wander around like a fool looking for why I did not get what I was expecting
As far as the model is concerned, I decided to use 16 hidden neurons, mostly because the other configurations that I tried all ended up with weird spikes in the valleys and peaks of the waves. This is the most notable problem I have encountered while trying to address this task: it is very easy to predict the upwards and downwards paths of the wave, while the peaks and valleys may raise some problems and be predicted as sudden spikes. 16 hidden units and about 1500 epochs seem to fix this problem.
These below are the results I obtained after some experiments:
This is the full prediction for the entire predictor matrix X
While this one is the prediction for the test set
I would say they look pretty good. I encourage you to try and play with this to look for the limits of the model. For instance I tryied to double the frequency of the cosine wave to 10Hz and still, the predictions look pretty good. Below you can see the X sequence (no change here) and the doubled frequency Y sequence. The last plot shows the prediction on the testing set vs the real values.
The code I used for this simple experiment is showed in the gist below. In order to get the plots for the doubled frequency example just put $f = 10$.
Thank you for reading this post, I hope you’ve found it interesting and useful. If you have any question, please do leave a comment.