Predicting images using Convolutional neural network

[This article was first published on Modeling with R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Introduction

In this article we will make use of the convolutional neural network, the most widely deep learning method used for image classification, object detection,..etc1. For more detail about how it works please click here.

We are going be learning how to build and train convolutional neural network model using small sample of images collected from google search. The data includes 30 images, each of which is either one of three types of animals: cat, dog, or lion, and each one has equally number of images, that is 10.

Data preparation

First, we call the packages needed along this paper and load the data into two different objects, one called train, will contain 7 instances of each animal type used for training the model, and another one, called test, will contain the remaining instances for the evaluation of the model performance.

ssh <- suppressPackageStartupMessages
ssh(library(EBImage))
ssh(library(keras))
ssh(library(foreach))

mytrain <- c(paste0("../images/cat",1:7,".jpg"),paste0("../images/dog",1:7,".jpg"),
        paste0("../images/lion",1:7,".jpg"))

mytest <- c(paste0("../images/cat",8:10,".jpg"),paste0("../images/dog",8:10,".jpg"),
        paste0("../images/lion",8:10,".jpg"))

train <- lapply(mytrain, readImage)
test <- lapply(mytest, readImage)

Now let us first figure out what information each image contains .

train[[1]]
## Image 
##   colorMode    : Color 
##   storage.mode : double 
##   dim          : 275 183 3 
##   frames.total : 3 
##   frames.render: 1 
## 
## imageData(object)[1:5,1:6,1]
##           [,1]      [,2]      [,3]      [,4]      [,5]      [,6]
## [1,] 0.2039216 0.2039216 0.2039216 0.2078431 0.2078431 0.2000000
## [2,] 0.2039216 0.2039216 0.2078431 0.2078431 0.2078431 0.2039216
## [3,] 0.2078431 0.2078431 0.2078431 0.2117647 0.2117647 0.2078431
## [4,] 0.2117647 0.2117647 0.2156863 0.2156863 0.2196078 0.2117647
## [5,] 0.2156863 0.2156863 0.2196078 0.2196078 0.2235294 0.2156863

As we see this image is color image with 275 pxl hight, 183 pxl width and 3 channels (RGB) since it is a color image.

we can visualize an image as follows:

plot(test[[4]])

If instead we want to visualize all the image as one block we can make use of foreach package to apply a for loop as follows.

par(mfrow=c(7,3))
foreach(i=1:21) %do% {plot(train[[i]])}

par(mfrow=c(1,1))

After having taken a brief glance at our data, we found that the image sizes are different from each other which is not what our image classification model expects. That is why, the following script will resize all the images to have the same size 150x150x3.

foreach(i=1:21) %do% {train[[i]] <- resize(train[[i]],150,150)}
foreach(i=1:9) %do% {test[[i]] <- resize(test[[i]],150,150)}

To check the result we use the following:

str(test)
## List of 9
##  $ :Formal class 'Image' [package "EBImage"] with 2 slots
##   .. ..@ .Data    : num [1:150, 1:150, 1:3] 0.761 0.78 0.773 0.755 0.768 ...
##   .. ..@ colormode: int 2
##  $ :Formal class 'Image' [package "EBImage"] with 2 slots
##   .. ..@ .Data    : num [1:150, 1:150, 1:3] 0.462 0.48 0.512 0.544 0.54 ...
##   .. ..@ colormode: int 2
##  $ :Formal class 'Image' [package "EBImage"] with 2 slots
##   .. ..@ .Data    : num [1:150, 1:150, 1:3] 0.986 0.992 0.951 0.945 0.929 ...
##   .. ..@ colormode: int 2
##  $ :Formal class 'Image' [package "EBImage"] with 2 slots
##   .. ..@ .Data    : num [1:150, 1:150, 1:3] 0.81 0.751 0.787 0.825 0.508 ...
##   .. ..@ colormode: int 2
##  $ :Formal class 'Image' [package "EBImage"] with 2 slots
##   .. ..@ .Data    : num [1:150, 1:150, 1:3] 0.361 0.502 0.49 0.627 0.524 ...
##   .. ..@ colormode: int 2
##  $ :Formal class 'Image' [package "EBImage"] with 2 slots
##   .. ..@ .Data    : num [1:150, 1:150, 1:3] 0.375 0.365 0.375 0.397 0.393 ...
##   .. ..@ colormode: int 2
##  $ :Formal class 'Image' [package "EBImage"] with 2 slots
##   .. ..@ .Data    : num [1:150, 1:150, 1:3] 0.651 0.57 0.614 0.636 0.63 ...
##   .. ..@ colormode: int 2
##  $ :Formal class 'Image' [package "EBImage"] with 2 slots
##   .. ..@ .Data    : num [1:150, 1:150, 1:3] 0.268 0.201 0.198 0.213 0.182 ...
##   .. ..@ colormode: int 2
##  $ :Formal class 'Image' [package "EBImage"] with 2 slots
##   .. ..@ .Data    : num [1:150, 1:150, 1:3] 0 0 0 0 0 ...
##   .. ..@ colormode: int 2

As we see all the images now have the same size as an array of 3 dimensions. The next step now is to combine all the images as one block.

trainall <- combine(train)
testall <- combine(test) 

We can display the output block usine the following:

display(tile(trainall,7))

Now the images are nicely combined in one block with four dimensions: number of instances (images), height, width, and number of channels, and this is the input that will be used in our model. However, to correctly read the input our model expects that the first dimension is the number of instances, the second is height , the third is width, and the fourth is the number of channels. Let us check whether the input has the correct order or not.

str(trainall)
## Formal class 'Image' [package "EBImage"] with 2 slots
##   ..@ .Data    : num [1:150, 1:150, 1:3, 1:21] 0.204 0.209 0.216 0.223 0.233 ...
##   ..@ colormode: int 2

This order is not correct since the number of instances is in the last position, so we reorder the positions as follows:

trainall <- aperm(trainall, c(4,1,2,3))
testall <- aperm(testall, c(4,1,2,3))

The Last thing that remains to be done, before customizing the architecture of our model, is to create a variable to hold the image’s labels, then convert it to a dummy variable.

trainlabels <- rep(0:2, each=7)
testlabels <- rep(0:2, each=3)
trainy <- to_categorical(trainlabels)
testy <- to_categorical(testlabels)

Training the model:

The architecture of our model will contain the following layers:

  1. Convolution layer that makes use of 32 filters with size 3x3 (since the input has 150x150x3 consequently the third dimension of the filter size will be 3 that is 3x3x3), and with Relu as activation function.
  2. maxPooling layer of 3x3 with strides=2.
  3. Convolution layer that makes use of 64 filters with size 5x5 , and with Relu function.
  4. maxPooling layer of 2x2 with strides=2.
  5. Convolution layer that makes use of 128 filters with size 3x3 , and with Relu function.
  6. maxPooling layer of 2x2 with strides=2.
  7. Flatten layer to collapse all the output elements into one giant vector to be able to connect to the traditional neural network with fully connected layers.
  8. dense layers composed of 256 nodes and with leaky_relu function. The slope for the negative part will be 0.1.
  9. Dropout layer with rate of 40%, this acts as regularization method by randomly ignoring 40% of nodes in each epoch (iteration).
  10. the last output layer with 3 nodes since we have 3 classes and with softmax function.

In keras package the above steps will be coded as follows:

model <- keras_model_sequential()

model %>% 
  layer_conv_2d(filters = 32,
                        kernel_size = c(3,3),
                        activation = "relu",
                        input_shape = c(150,150,3))%>%
  layer_max_pooling_2d(pool_size = c(3,3), strides = 2)%>%
  layer_conv_2d(filters = 64,
               kernel_size = c(5,5),
                activation = "relu") %>%
  layer_max_pooling_2d(pool_size = c(2,2), strides = 2)%>%
  layer_conv_2d(filters = 128,
                kernel_size = c(3,3),
                activation = "relu") %>%
  layer_max_pooling_2d(pool_size = c(2,2), strides = 2)%>%
  layer_flatten()%>%
  layer_dense(units=256)%>% layer_activation_leaky_relu(alpha = 0.1)%>%
  layer_dropout(rate=0.4)%>%
  layer_dense(units=3, activation = "softmax")

We can figure out this architecture and how many parameters it has by calling the summary function.

summary(model)
## Model: "sequential"
## ________________________________________________________________________________
## Layer (type)                        Output Shape                    Param #     
## ================================================================================
## conv2d (Conv2D)                     (None, 148, 148, 32)            896         
## ________________________________________________________________________________
## max_pooling2d (MaxPooling2D)        (None, 73, 73, 32)              0           
## ________________________________________________________________________________
## conv2d_1 (Conv2D)                   (None, 69, 69, 64)              51264       
## ________________________________________________________________________________
## max_pooling2d_1 (MaxPooling2D)      (None, 34, 34, 64)              0           
## ________________________________________________________________________________
## conv2d_2 (Conv2D)                   (None, 32, 32, 128)             73856       
## ________________________________________________________________________________
## max_pooling2d_2 (MaxPooling2D)      (None, 16, 16, 128)             0           
## ________________________________________________________________________________
## flatten (Flatten)                   (None, 32768)                   0           
## ________________________________________________________________________________
## dense (Dense)                       (None, 256)                     8388864     
## ________________________________________________________________________________
## leaky_re_lu (LeakyReLU)             (None, 256)                     0           
## ________________________________________________________________________________
## dropout (Dropout)                   (None, 256)                     0           
## ________________________________________________________________________________
## dense_1 (Dense)                     (None, 3)                       771         
## ================================================================================
## Total params: 8,515,651
## Trainable params: 8,515,651
## Non-trainable params: 0
## ________________________________________________________________________________

As we see we have huge number of parameters 8 515 651. Since the data has only 21 instances, the computation process in my laptop takes only few seconds. However, with large data set this model may take more time.

The last step before running the model is to specify the loss function, the optimizer and the metric.

  • For multiclassification problem the most widely used one is categorical cross entropy.

  • Besides the popular gradient descent optimizer (with its versions , stochastic gradient descent and mini batch gradient descent), there exist other ones such as adam , adadelta, mrsprop (the first one will be used for our case). In practice sometimes we finetune the hyperparameters by changing these optimizers.

  • For classification problems we have many metrics, the famous ones are: accuracy (used for our case), roc, area under roc, precision.

model %>% compile(loss= "categorical_crossentropy",
                  optimizer="adam",
                  metrics="accuracy")

At this stage everything is ready to train our model by calling the function fit. the epoch value is the number of iterations or the gradient descent steps, and the validation_split is the holdout samples used for assessment, here four images. I have run this model before and in oreder to avoide running it again i have commented the script by #, if you want to run it just uncomment the script.

#history <- model %>%
  #fit(trainall, trainy, epoch=50, validation_split=0.2)

unlike machine learning model in which we can set a seed to get the result reproducible, here each time we rerun the model we get different result. In practice, we intentionally rerun the model many times to improve the model performance, and ones we get the best one we save it as follows:

#save_model_hdf5(model, "modelcnn.h5")

And we can load it again as follows:

model <- load_model_hdf5("modelcnn.h5")

The history object has all the necessary information such as the metric values for each epoch , so we can extract this informatiton to create a plot as follows.

#train_loss <- history$metrics$loss
#valid_loss <- history$metrics$val_loss
#train_acc <- history$metrics$accuracy
#valid_acc <- history$metrics$val_accuracy
#epoch <- 1:50
#df <- tibble::tibble(epoch,train_loss,valid_loss,train_acc,valid_acc)
library(ggplot2)
#p1 <- ggplot(df,aes(x=epoch, train_loss))+
 # geom_point(size=1, color="blue")+
#  geom_point(aes(x=epoch, valid_loss), size=1, color="red")+
 # ylab("Loss")
#ggsave("plot_loss.jpg", p1, device = "jpeg", width = 20, height = 15, units = "cm")

As you notice all the above codes have not been executed to avoide the issue discussed above and to make things simple. here we load the saved plot.

par(mar=c(0,0,0,0))
plot(as.raster(readImage("plot_loss.jpg")))

This plot shows the loss values for both the training set (in blue) and the validation set (in red), we see that the training loss consistently increases whereas the validation loss largely oscillating reflecting the less capability of the model to well predict the new unseen examples. The same conclusion can be induced from the following plot for the accuracy metric.

#p2 <- ggplot(df,aes(x=epoch, train_acc))+
  #geom_point(size=1, color="blue")+
  #geom_point(aes(x=epoch, valid_acc), size=1, color="red")+
 # ylab("accuracy")
#ggsave("plot_acc.jpg", p2, device = "jpeg", width = 20, height = 15, units = "cm")

Here also we do the same thing

par(mar=c(0,0,0,0))
plot(as.raster(readImage("plot_acc.jpg")))

Note: we coud have used directly the plot function , plot(history), but doing so we will get different plot each time we knit the document.

Model Evaluation

We can evaluate the model performance using the training set as follows:

train_evaluate<- evaluate(model, trainall, trainy)

With this first architecture we get a high accuracy rate 95.24% and the loss is 0.0832. However, you should be cautious when this rate is very high since it is computed from the training data which in many cases reflects the overfitting problem2. The best evaluation thus is that based on the testing set.

test_evaluate<- evaluate(model, testall, testy)

Using the testing set that is not seen by the model, the accuracy rate is about 55.56%. In fact this is exactly what we warned about it, indeed we have an overfitting problem where the model try to memorize every noisy pattern which will constrain the model to poorly generalize to unseen data.

Prediction

We can get the predictions of the testing set as follows:

pred <- predict_classes(model,testall)
pred
## [1] 0 0 2 0 2 0 2 2 2

the following plot shows which images from the testing set are correctly classified and which are not:

pred[pred==0] <- "cat"
pred[pred==1] <- "dog"
pred[pred==2] <- "lion"


par(mfrow=c(3,3))


foreach(i=1:9) %do% {display(test[[i]], method="raster");
  text(x = 20, y = 20, label = pred[i], 
       adj = c(0,1), col = "black", cex = 4)
}

par(mfrow=c(1,1))

Using this model to predict the test examples, all the dogs are misclassified whereas the lions are perfectly classified.

We can also display the training examples as follows:

pred1 <- predict_classes(model,trainall)

pred1[pred1==0] <- "cat"
pred1[pred1==1] <- "dog"
pred1[pred1==2] <- "lion"


par(mfrow=c(7,3))


foreach(i=1:21) %do% {display(train[[i]], method="raster");
  text(x = 20, y = 20, label = pred1[i], 
       adj = c(0,1), col = "black", cex = 2)
}

par(mfrow=c(1,1))

Conclusion

As we see this model perfectly identified lions but failed to identify any of the dogs in the testing set which is not the case for the training data where the model has high accuracy, and as I mentioned earlier this is the consequences of the overfitting problem. However, There are bunch of techniques that can be used in such situation such as regularization methods (L2,L1), pooling, dropout layers..ect. All these techniques will be addressed soon in further articles. Besides overfitting, we can also improve the model by playing around with hyperparameters such as changing , the number of the layers, number of the nodes in each layer, number of epochs…ect.

Note: Be aware that this model can not be reliable since it has used very small data. However, we may get a higher performance for this model if we implement very large dataset.


  1. Francois chollet, Deep learning with R, Meap edition, 2017, P112 ↩︎

  2. An introduction to statistical learning, Garth et al, spring, New York, page 33, ISBN:978-1-4614-7173-0↩︎

To leave a comment for the author, please follow the link and comment on their blog: Modeling with R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)