[This article was first published on Methods – finnstats, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

LSTM network in R, In this tutorial, we are going to discuss Recurrent Neural Networks. Recurrent Neural Networks are very useful for solving sequence of numbers-related issues.

The major applications involved in the sequence of numbers are text classification, time series prediction, frames in videos, DNA sequences Speech recognition problems, etc..

A special type of Recurrent Neural network is LSTM Networks.

LSTM networks are very popular and handy.

## What is mean by LSTM?

LSTM stands for long short-term memory.

LSTM network helps to overcome gradient problems and makes it possible to capture long-term dependencies in the sequence of words or integers.

In this tutorial, we are using the internet movie database (IMDB). This database contains sentiments of movie reviews like 25000 positive reviews and 25000 negative reviews.

```library(keras)
library(tensorflow)
use_condaenv("keras-tf", required = T)```

### Getting Data

`imdb <- dataset_imdb(num_words = 500)`

These datasets are already pre processed so no need to clean the datasets.

How to clean the datasets in R?

```c(c(train_x, train_y), c(test_x, test_y)) %<-% imdb
length(train_x); length(test_x)```

train_x and test_x contains integer values

train_y and test_y contains labels (0 & 1).

0 represent the negative sentiment and 1 represent positive sentiment in the movie review

```table(train_y)
train_y
0     1
12500 12500
table(test_y)
test_y
0     1
12500 12500```

This indicates that our dataset is balanced.

Words in the movie review are represented by unique integers and each integer is assigned by overall frequency in the dataset. Customer review can extract from the below command.

```train_x[]
   1  14  20  47 111 439   2  19  12  15 166  12 216 125  40   6 364 352   2   2  39 294  11  22 396  13  28   8 202  12   2  23  94
   2 151 111 211 469   4  20  13 258   2   2   2  12  16  38  78  33 211  15  12  16   2  63  93  12   6 253 106  10  10  48 335 267
  18   6 364   2   2  20  19   6   2   7   2 189   5   6   2   7   2   2  95   2   6   2   7   2   2  49 369 120   5  28  49 253  10
  10  13   2  19  85   2  15   4 481   9  55  78   2   9 375   8   2   8   2  76   7   4  58   5   4   2   9 243   7  43  50```

Before doing any further analysis we need to make ensure the length of the movie reviews are equal, The current dataset has different length this can overcome based on padding process.

How to execute R in PyCharm?

```train_x <- pad_sequences(train_x, maxlen = 90)
num [1:25000, 1:90] 14 2 360 2 13 0 26 11 6 13 ...
test_x <- pad_sequences(test_x, maxlen = 90)
num [1:25000, 1:90] 0 2 30 8 10 20 2 2 2 50 ...```

Now all the train_x and test_x integers restricted to 90 only. So padding removed all extra integers.

Now you can examine train_x[10,] customer review again

```  13 258   2   2   2  12  16  38  78  33 211  15  12  16   2  63  93  12   6 253 106  10  10  48 335 267  18   6 364   2   2  20  19   6
   2   7   2 189   5   6   2   7   2   2  95   2   6   2   7   2   2  49 369 120   5  28  49 253  10  10  13   2  19  85   2  15   4 481
   9  55  78   2   9 375   8   2   8   2  76   7   4  58   5   4   2   9 243   7  43  50```

If the dataset contains fewer number integers suppose 60 integers remaining 30 integers that is 0 will be added automatically.

### Model

Initiate model with keras function kera_model_sequantiall and embedded the recurrent neural network layers.

```model <- keras_model_sequential()
model %>%
layer_embedding(input_dim = 500, output_dim = 32) %>%
layer_simple_rnn(units = 32) %>%
layer_dense(units = 1, activation = "sigmoid")```

activation we used sigmoid function that is very useful for interpretation purposes.

Repeated measures of ANOVA in R

### Compile Model

```model %>% compile(optimizer = "rmsprop",
loss = "binary_crossentropy",
metrics = c("acc"))```

### Fit model

```history <- model %>% fit(train_x, train_y,
epochs = 25,
batch_size = 128,
validation_split = 0.2)
plot(history)
```

validation_split indictes 20% of the dataset used for validation purposes.

The top one is for loss and the second one is for accuracy, now you can see validation dataset loss is increasing and accuracy is decreasing from a certain epoch onwards. So this because of overfitting.

## Model Prediction

```model %>% evaluate(train_x, train_y)
loss       acc
0.3644736 0.8765600
pred <- model %>%
predict_classes(train_x)
table(Predicted=pred, Actual=imdb\$train\$y)
Actual Predicted
0     1
0 11503  2089
1   997 10411
model %>% evaluate(test_x, test_y)
loss      acc
1.032544 0.687720
pred1 <- model %>%
predict_classes(test_x)
table(Predicted=pred1, Actual=imdb\$test\$y)
Actual Predicted
0    1
0 9203 4510
1 3297 7990```

In the training dataset, we got 87% of accuracy and it falls into 68% in the test dataset.

So improvement required in the model for better prediction.

15 Essential packages in R

You can make some changes in the model

```model %>%
layer_embedding(input_dim = 500, output_dim = 32) %>%
layer_simple_rnn(units = 32,return_sequences = TRUE,activation = 'relu') %>%
layer_simple_rnn(units = 32,return_sequences = TRUE,activation = 'relu') %>%
layer_simple_rnn(units = 32) %>%
layer_dense(units = 1, activation = "sigmoid")```

In the above model instead of 1 layer, we used 3 layers, return sequences mentioned as TRUE and relu activation function used. Other changes we can do in padding. In the current model we used 90 instead of that we can find out the average customer review and the same can use for padding.

```z<-NULL
for(i in 1:250000){z[i]<-print(length(train_x[[i]]))}
summary(z)
Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
11.0   130.0   178.0   238.7   291.0  2494.0```

Median is coming 178 and mean is 238 and we use some middle number for padding like 200.

```train_x <- pad_sequences(train_x, maxlen = 200)
test_x <- pad_sequences(test_x, maxlen = 200)```

Rerun the model and check the accuracy again.

```model %>% evaluate(train_x, train_y)
loss       acc
0.3733827 0.8421200```

The train dataset accuracy is 84% earlier it was 87%

```model %>% evaluate(test_x, test_y)
loss       acc
0.4351899 0.8114400
```

Test data set accuracy significantly improved from 68% to 81%.

Now you can check with simple LSTM model for better prediction

Naïve Bayes Classification in R

## LSTM Network in R

```model %>%
layer_embedding(input_dim = 500, output_dim = 32) %>%
layer_lstm(units = 32,return_sequences = TRUE) %>%
layer_lstm(units = 32,return_sequences = TRUE) %>%
layer_lstm(units = 32) %>%
layer_dense(units = 1, activation = "sigmoid")```

When you are using LSTM model try the optimizer “adam” for better prediction.

### Compile

```model %>% compile(optimizer = "adam",
loss = "binary_crossentropy",
metrics = c("acc"))```

### Bidirectional LSTM Model

```model %>%
layer_embedding(input_dim = 500, output_dim = 32) %>%
layer_lstm(units = 32,return_sequences = TRUE) %>%
layer_lstm(units = 32,return_sequences = TRUE) %>%
bidirectional(layer_lstm(units = 32)) %>%
layer_dense(units = 1, activation = "sigmoid")```

## Conclusion

The model accuracy improved in different steps we experimented with, instead of doing a simple LSTM model you can try for a bidirectional model for better prediction.

Deep Neural Network with R

The post LSTM Network in R appeared first on finnstats.