Site icon R-bloggers

Plain vanilla recurrent neural networks in R: waves prediction

[This article was first published on The Beginner Programmer, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

While continuing my study of neural networks and deep learning, I inevitably meet up with recurrent neural networks.

Recurrent neural networks (RNN) are a particular kind of neural networks usually very good at predicting sequences due to their inner working. If your task is to predict a sequence or a periodic signal, then using a RNN might be a good starting point. Plain vanilla RNN work fine but they have a little problem when trying to “keep in memory” events occured, say for instance, more than 20 steps back. The solution to this problem has been addressed with the development of a model called LSTM network. As far as I know, LSTM should usually be preferred to a plain vanilla RNN when possible as it yields better results.


In this post however, I am going to work on a plain vanilla RNN model. The reasons for doing this are two. First of all this is one of my first experience with RNN and I would like to get comfortable with them before going deeper; secondly, R provides a simple and very user friendly package named “rnn” for working with recurrent neural networks. I am going to dive in LSTM using MXNET and Tensorflow later.

Task description

The task I am going to address is trying to predict a cosine from a noisy sine wave. Here below you can see the plot of the predictor X sequence and the Y sequence to be predicted.

X is essentially a sine wave with some normally distributed noise, while Y is a straightforward smooth cosine wave.

You can clearly see that what I expect the model to do is to capture the phase shift of 90 degrees between the two waves and to throw away the noise in the input.

I chose to use a 5Hz frequency for both the waves but you can play around maybe trying to obtain similar results changing the frequency. Be aware though that the higher the frequency, the more datapoints you need to avoid problems that comes with the sampling theorem.

Preprocessing

The artificial dataset I created for this task is a set of 10 sequences each of which consists of 40 observations. The X matrix contains 10 sequences of a noisy sine wave while the Y matrix contains the corresponding 10 sequences of a clear cosine wave.

Before fitting the model I standardized all the data in the $[0 – 1]$ interval. When using any neural network model with real valued data, make sure not to avoid this step because if you do, then you might spend the next hour trying to figure out why the model did not converge or spitted out weird results. I am not an expert but I know from personal painful experience that this step is usually crucial, nevertheless I may occasionally forget to do it and then wander around like a fool looking for why I did not get what I was expecting

Model

As far as the model is concerned, I decided to use 16 hidden neurons, mostly because the other configurations that I tried all ended up with weird spikes in the valleys and peaks of the waves. This is the most notable problem I have encountered while trying to address this task: it is very easy to predict the upwards and downwards paths of the wave, while the peaks and valleys may raise some problems and be predicted as sudden spikes. 16 hidden units and about 1500 epochs seem to fix this problem.

Results

These below are the results I obtained after some experiments:

This is the full prediction for the entire predictor matrix X

While this one is the prediction for the test set

I would say they look pretty good. I encourage you to try and play with this to look for the limits of the model. For instance I tryied to double the frequency of the cosine wave to 10Hz and still, the predictions look pretty good. Below you can see the X sequence (no change here) and the doubled frequency Y sequence. The last plot shows the prediction on the testing set vs the real values.

The code I used for this simple experiment is showed in the gist below. In order to get the plots for the doubled frequency example just put $f = 10$.

Thank you for reading this post, I hope you’ve found it interesting and useful. If you have any question, please do leave a comment.

To leave a comment for the author, please follow the link and comment on their blog: The Beginner Programmer.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.