**Appsilon Data Science Blog**, and kindly contributed to R-bloggers)

## Fashion-MNIST

About three weeks ago the Fashion-MNIST dataset of Zalando’s article images, which is a great replacement of classical MNIST dataset, was released. In the following article we will try to build a strong classifier using H2O and R.

Each example is a 28×28 grayscale image, associated with a label from 10 classes:

- T-shirt/top
- Trouser
- Pullover
- Dress
- Coat
- Sandal
- Shirt
- Sneaker
- Bag
- Ankle boot

You can download it here https://www.kaggle.com/zalando-research/fashionmnist

The first column is an image label and the other 784 pixel columns are associated with the darkness of that pixel.

## Quick reminder: what is H2O?

H2O is an open-source, fast, scalable platform for machine learning written in Java. It allows access to all of its capabilities from Python, Scala and most importantly from R via REST API.

Overview of available algorithms:

- Supervised:
- Deep Learning (Neural Networks)
- Distributed Random Forest (DRF)
- Generalized Linear Model (GLM)
- Gradient Boosting Machine (GBM)
- Naive Bayes Classifier
- Stacked Ensembles
- XGBoost

- Unsupervised
- Generalized Low Rank Models (GLRM)
- K-Means Clustering
- Principal Component Analysis (PCA)

Instalation is easy:

## Building a neural network for image classification

Let’s start by running an H2O cluster:

Next we will import data into H2O using `h2o.importFile()`

function, in which we can specify column types and column names if needed. If you want to send data into H2O directly from R, you can use `as.h2o()`

function

If everything went fine, we can check if our datasets are in H2O:

Before we begin modeling, let’s take a quick look at the data:

Now we will build a simple neural network, with one hidden layer of ten neurons:

If we set `export_weights_and_biases`

parameter to `TRUE`

networks weights and biases will be saved and we can retrieve them using `h2o.weights()`

and `h2o.biases()`

functions. Thanks to this we can try to visualize neurons from the hidden layer (Note that we set ignore_const_cols to `FALSE`

to get weights for every pixel).

We can definitely see some resemblance to shirts and sneakers. Let’s test our model:

Accuracy 0.7632 isn’t a great result, but we didn’t use full capabilities of H2O yet. We should do something more advanced!

In `h2o.deeplearning()`

function there’s over 70 parameters responsible for structure and optimization of our model. Changing thme should give as much better results.

Accuracy 0.916 is a lot better result, but there’s sitll a lot of thing we can do to improve our model. In the future we can consider using a grid or random search to find best hyperparameters or use same ensemble methods to get better results.

**leave a comment**for the author, please follow the link and comment on their blog:

**Appsilon Data Science Blog**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...