RObservations #35 : Predicting Rubik’s Cube Rotations with CNNs

Posted on July 15, 2022 by Benjamin Smith in R bloggers | 0 Comments

[This article was first published on r – bensstats, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Disclaimer: While working on this project on my local machine I noticed that the code was making my computer heat up. To avoid the risk of overheating my computer I opted to use a Kaggle notebook. As a bonus, I got to use some GPU computing which made training this model much faster than it would be on my machine! Feel free to run the code on your machine or fork the notebook!

Introduction

In my previous blog I explored sentiment prediction by using LSTM networks and its implementation with keras and R. In this blog I am going to share how to predict rotation of Rubiks cubes with convolutional neural networks (CNNs). For this challenge, I had the opportunity to do some basic image preprocessing and construct a CNN which predicts and continuous output as opposed to a categorical one which is more common among CNN application examples. Since the goal is to predict a continuous value, the aim is to reduce the margin of error between the predicted and the true value to be as small as possible.

The Data

The data consists of two folders. The training folder has a .csv file lists the file names of each rubiks cube image among the training images and their respective rotation. The images consist of 5000 512×512 pixel color (RGB) images of rotated rubiks cubes. The other folder has testing images whose angle of rotation are not given. Since the test data does not have any labels, for our purposes it is not going to be very helpful. So for this blog we are going to be working with just the training data and its labels.

The labels for the training data looks like this:

training_labels <- readr::read_csv("../input/rubix-cube/training/training/labels.csv") 

head(training_labels)

filename	xRot

000000.jpg	336.8389
000001.jpg	148.4844
000002.jpg	244.8217
000003.jpg	222.7006
000004.jpg	172.3581
000005.jpg	205.6921

Thanks to the imager package it is possible to convert the images into matrix form which can then be converted into an array which keras likes. One of the questions that I encountered was related to converting a list of 3D arrays to s 4D array, which I was able to figure out thanks to this stackoverflow question. While the solution is not eloquent, it works.

Due to the size of the dataset, after preprocessing the data needed to be converted and used in groups and the model needed to be trained in even smaller batches. An example for processing the first 3 images in the dataset would be:

library(tidyverse)
library(imager)

images<- lapply(
    training_labels[["filename"]][1:3],
    function(x) paste0("../input/rubix-cube/training/training/images/",x) %>% 
      load.image() %>% 
      as.cimg()) %>% 
    lapply(function(x) x[,,,])
  
# Source: https://stackoverflow.com/questions/62060430/image-list-of-3d-arrays-to-one-4d-array
images_array<-array(NA, dim = c(length(images), 512, 512, 3))
  for(j in 1:length(images)){
    images_array[j,,,]<-images[[j]]
  }

They can also be plotted visually with purrr:

par(mfrow=c(1,3))
images_array[1:3,,,] %>% 
  purrr::array_tree(1) %>%
  purrr::set_names(training_labels[["xRot"]][1:3]) %>% 
  purrr::map(as.raster) %>%
  purrr::iwalk(~{plot(.x); title(.y)})

With this, the images are preprocessed and able to be used for training our model.

The Model

As far as modelling is concerned, I created a convolutional neural network where the first layer and the input shape matches the dimensions of the images. The subsequent layers are pretty pretty much follow the code used on RStudio’s website with their CNNs example. As far as the loss function is concerned I opted for mean squared error, and have it and mean absolute value as a metrics.

library(keras)
model <- keras_model_sequential() %>% 
  layer_conv_2d(filters = 512, 
                kernel_size = c(3,3), 
                activation = "relu", 
                input_shape = c(512,512,3)) %>% 
  layer_max_pooling_2d(pool_size = c(2,2)) %>% 
  layer_conv_2d(filters = 256, kernel_size = c(3,3), activation = "relu") %>% 
  layer_max_pooling_2d(pool_size = c(2,2)) %>% 
  layer_conv_2d(filters = 128, kernel_size = c(3,3), activation = "relu") %>% 
  layer_flatten() %>% 
  layer_dense(units = 64, activation = "relu") %>% 
  layer_dense(units = 1, activation = "relu")

# Compile the model 

model %>% compile(
  optimizer = "adam",
  loss = "mean_squared_error",
  metrics = c("mean_squared_error",
              "mean_absolute_error")
)

Due to the magnitude in size of the data, the data cannot be preprocessed and trained in a single step. In leiu of this, the data is grouped and trained in groups of 100 images with batch sizes being 2.

set.seed(1234)

# Using chunks
history<-list()
for (i in 0:49){
  
  start_index <- i*100+1
  end_index <- (i+1)*100
  images<- lapply(
    training_labels[["filename"]][start_index:end_index],
    function(x) paste0("../input/rubix-cube/training/training/images/",x) %>% 
      load.image() %>% 
      as.cimg()) %>% 
    lapply(function(x) x[,,,])
  
  # Source: https://stackoverflow.com/questions/62060430/image-list-of-3d-arrays-to-one-4d-array
  images_array<-array(NA, dim = c(length(images), 512, 512, 3))
  for(j in 1:length(images)){
    images_array[j,,,]<-images[[j]]
  }
           
  # Split data into train-test groups
  labels <- training_labels[["xRot"]][start_index:end_index]
  
  smp_size <- floor(0.75 * length(images))
  train_ind <- sample(seq_len(length(images)), size = smp_size)
  train_x <- images_array[train_ind,,,]
  test_x <- images_array[-train_ind,,,]
  train_y <- labels[train_ind]
  test_y<- labels[-train_ind]
  
  # training the model
  history[[i+1]]<- model %>% fit(x=train_x,
                                 y=train_y,
                                 epochs=10,
                                 batch_size=2,
                                 verbose = getOption("keras.fit_verbose", default = 1),
                                 validation_split = 0.25,
                                 validation_data = list(test_x,test_y))
  # Free up unused RAM
  gc()
}

history[[50]]

Final epoch (plot to see history):
                   loss: 2.339
     mean_squared_error: 2.339
    mean_absolute_error: 1.303
               val_loss: 6.561
 val_mean_squared_error: 6.561
val_mean_absolute_error: 1.865 

plot(history[[50]])

From the model history’s final interation, the validation MSE which isn’t bad. But if we want to make something more production worthy, a better model is definitely required.

If you know how to make this model better, or know of a better approach, please let me know! I would love to learn how to get better at making machine learning models!

Conclusion

There we have it! It was really interesting getting to preprocess images and deal with the quirks of having to deal with processing limitations and still managing to train the model. I will definitely keep this blog handy for my next image classification project.

Thank you for reading!