Zalando’s images classification using H2O with R

[This article was first published on Appsilon Data Science Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Fashion-MNIST

About three weeks ago the Fashion-MNIST dataset of Zalando’s article images, which is a great replacement of classical MNIST dataset, was released. In the following article we will try to build a strong classifier using H2O and R.

Each example is a 28×28 grayscale image, associated with a label from 10 classes:

  1. T-shirt/top
  2. Trouser
  3. Pullover
  4. Dress
  5. Coat
  6. Sandal
  7. Shirt
  8. Sneaker
  9. Bag
  10. Ankle boot

You can download it here https://www.kaggle.com/zalando-research/fashionmnist

The first column is an image label and the other 784 pixel columns are associated with the darkness of that pixel.

Quick reminder: what is H2O?

H2O is an open-source, fast, scalable platform for machine learning written in Java. It allows access to all of its capabilities from Python, Scala and most importantly from R via REST API.

Overview of available algorithms:

  1. Supervised:
    • Deep Learning (Neural Networks)
    • Distributed Random Forest (DRF)
    • Generalized Linear Model (GLM)
    • Gradient Boosting Machine (GBM)
    • Naive Bayes Classifier
    • Stacked Ensembles
    • XGBoost
  2. Unsupervised
    • Generalized Low Rank Models (GLRM)
    • K-Means Clustering
    • Principal Component Analysis (PCA)

Instalation is easy:

install.packages("h2o")
library(h2o)

Building a neural network for image classification

Let’s start by running an H2O cluster:

h2o.init(ip = "localhost",
         port = 54321,
         nthreads = -1,
         min_mem_size = "20g")
H2O is not running yet, starting it now...

Note:  In case of errors look at the following log files:
    /tmp/RtmpQEf3RX/h2o_maju116_started_from_r.out
    /tmp/RtmpQEf3RX/h2o_maju116_started_from_r.err

openjdk version "1.8.0_131"
OpenJDK Runtime Environment (build 1.8.0_131-8u131-b11-2ubuntu1.16.04.3-b11)
OpenJDK 64-Bit Server VM (build 25.131-b11, mixed mode)

Starting H2O JVM and connecting: .. Connection successful!

R is connected to the H2O cluster: 
    H2O cluster uptime:         1 seconds 906 milliseconds 
    H2O cluster version:        3.13.0.3973 
    H2O cluster version age:    1 month and 5 days  
    H2O cluster name:           H2O_started_from_R_maju116_cuf927 
    H2O cluster total nodes:    1 
    H2O cluster total memory:   19.17 GB 
    H2O cluster total cores:    8 
    H2O cluster allowed cores:  8 
    H2O cluster healthy:        TRUE 
    H2O Connection ip:          localhost 
    H2O Connection port:        54321 
    H2O Connection proxy:       NA 
    H2O Internal Security:      FALSE 
    H2O API Extensions:         XGBoost, Algos, AutoML, Core V3, Core V4 
    R Version:                  R version 3.4.1 (2017-06-30) 

Next we will import data into H2O using h2o.importFile() function, in which we can specify column types and column names if needed. If you want to send data into H2O directly from R, you can use as.h2o() function

fmnist_train <- h2o.importFile(path = "data/fashion-mnist_train.csv", 
                               destination_frame = "fmnist_train",
                               col.types=c("factor", rep("int", 784)))

fmnist_test <- h2o.importFile(path = "data/fashion-mnist_test.csv",
                              destination_frame = "fmnist_test",
                              col.types=c("factor", rep("int", 784)))

If everything went fine, we can check if our datasets are in H2O:

h2o.ls()
           key
1  fmnist_test
2 fmnist_train

Before we begin modeling, let’s take a quick look at the data:

xy_axis <- data.frame(x = expand.grid(1:28,28:1)[,1],
                      y = expand.grid(1:28,28:1)[,2])
plot_theme <- list(
  raster = geom_raster(hjust = 0, vjust = 0),
  gradient_fill = scale_fill_gradient(low = "white", high = "black", guide = FALSE),
  theme = theme(axis.line = element_blank(),
                axis.text = element_blank(),
                axis.ticks = element_blank(),
                axis.title = element_blank(),
                panel.background = element_blank(),
                panel.border = element_blank(),
                panel.grid.major = element_blank(),
                panel.grid.minor = element_blank(),
                plot.background = element_blank())
)

sample_plots <- sample(1:nrow(fmnist_train),100) %>% map(~ {
  plot_data <- cbind(xy_axis, fill = as.data.frame(t(fmnist_train[.x, -1]))[,1]) 
  ggplot(plot_data, aes(x, y, fill = fill)) + plot_theme
})

do.call("grid.arrange", c(sample_plots, ncol = 10, nrow = 10))

100 Random items from Fashion-MNIST dataset

Now we will build a simple neural network, with one hidden layer of ten neurons:

fmnist_nn_1 <- h2o.deeplearning(x = 2:785,
                                y = "label", 
                                training_frame = fmnist_train,
                                distribution = "multinomial",
                                model_id = "fmnist_nn_1",
                                l2 = 0.4,
                                ignore_const_cols = FALSE,
                                hidden = 10, 
                                export_weights_and_biases = TRUE)

If we set export_weights_and_biases parameter to TRUE networks weights and biases will be saved and we can retrieve them using h2o.weights() and h2o.biases() functions. Thanks to this we can try to visualize neurons from the hidden layer (Note that we set ignore_const_cols to FALSE to get weights for every pixel).

weights_nn_1 <- as.data.frame(h2o.weights(fmnist_nn_1, 1))
biases_nn_1 <- as.vector(h2o.biases(fmnist_nn_1, 1))

neurons_plots <- 1:10 %>% map(~ {
  plot_data <- cbind(xy_axis, fill = t(weights_nn_1[.x,]) + biases_nn_1[.x])
  colnames(plot_data)[3] <- "fill"
  ggplot(plot_data, aes(x, y, fill = fill)) + plot_theme
})

do.call("grid.arrange", c(neurons_plots, ncol = 3, nrow = 4))

Hidden layer

We can definitely see some resemblance to shirts and sneakers. Let’s test our model:

h2o.confusionMatrix(fmnist_nn_1, fmnist_test)
Confusion Matrix: Row labels: Actual class; Column labels: Predicted class
          0   1    2    3    4    5   6    7   8   9  Error               Rate
0       801  12   14   87    2   36  25    1  22   0 0.1990     = 199 / 1 000
1         6 938   23   25    1    3   4    0   0   0 0.0620     =  62 / 1 000
2        24   4  695    7  188   18  49    0  15   0 0.3050     = 305 / 1 000
3        43  23   12  865   21   13  22    0   1   0 0.1350     = 135 / 1 000
4         1   6  138   44  770   14  25    0   2   0 0.2300     = 230 / 1 000
5         0   0    1    0    0  865   0   90   7  37 0.1350     = 135 / 1 000
6       273   6  224   53  262   46 107    0  28   1 0.8930     = 893 / 1 000
7         0   0    0    0    0  107   0  838   0  55 0.1620     = 162 / 1 000
8         4   1   13   22    5   36  10    8 897   4 0.1030     = 103 / 1 000
9         0   0    0    0    0   40   0  104   0 856 0.1440     = 144 / 1 000
Totals 1152 990 1120 1103 1249 1178 242 1041 972 953 0.2368 = 2 368 / 10 000

Accuracy 0.7632 isn’t a great result, but we didn’t use full capabilities of H2O yet. We should do something more advanced!

In h2o.deeplearning() function there’s over 70 parameters responsible for structure and optimization of our model. Changing thme should give as much better results.

fmnist_nn_final <- h2o.deeplearning(x = 2:785,
                                    y = "label",
                                    training_frame = fmnist_train,
                                    distribution = "multinomial",
                                    model_id = "fmnist_nn_final",
                                    activation = "RectifierWithDropout",
                                    hidden=c(1000, 1000, 2000),
                                    epochs = 180,
                                    adaptive_rate = FALSE,
                                    rate=0.01,
                                    rate_annealing = 1.0e-6,
                                    rate_decay = 1.0,
                                    momentum_start = 0.4,
                                    momentum_ramp = 384000,
                                    momentum_stable = 0.98, 
                                    input_dropout_ratio = 0.22,
                                    l1 = 1.0e-5,
                                    max_w2 = 15.0, 
                                    initial_weight_distribution = "Normal",
                                    initial_weight_scale = 0.01,
                                    nesterov_accelerated_gradient = TRUE,
                                    loss = "CrossEntropy",
                                    fast_mode = TRUE,
                                    diagnostics = TRUE,
                                    ignore_const_cols = TRUE,
                                    force_load_balance = TRUE,
                                    seed = 3.656455e+18)

h2o.confusionMatrix(fmnist_nn_final, fmnist_test)
Confusion Matrix: Row labels: Actual class; Column labels: Predicted class
          0    1    2    3    4   5   6    7    8   9  Error            Rate
0       898    0   14   15    1   1  66    0    5   0 0.1020  = 102 / 1 000
1         2  990    2    6    0   0   0    0    0   0 0.0100   = 10 / 1 000
2        12    1  875   13   60   1  35    0    3   0 0.1250  = 125 / 1 000
3        16   11    8  925   23   1  14    0    2   0 0.0750   = 75 / 1 000
4         1    0   61   21  885   0  30    0    2   0 0.1150  = 115 / 1 000
5         0    0    1    0    0 964   0   24    1  10 0.0360   = 36 / 1 000
6       131    2   66   22   50   0 722    0    7   0 0.2780  = 278 / 1 000
7         0    0    0    0    0  10   0  963    0  27 0.0370   = 37 / 1 000
8         4    1    4    1    1   2   3    2  981   1 0.0190   = 19 / 1 000
9         0    0    0    0    0   6   0   37    0 957 0.0430   = 43 / 1 000
Totals 1064 1005 1031 1003 1020 985 870 1026 1001 995 0.0840 = 840 / 10 000

Accuracy 0.916 is a lot better result, but there’s sitll a lot of thing we can do to improve our model. In the future we can consider using a grid or random search to find best hyperparameters or use same ensemble methods to get better results.

To leave a comment for the author, please follow the link and comment on their blog: Appsilon Data Science Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)