[This article was first published on R – Hi! I am Nagdev, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Autoencoders is an unsupervised version of neural network that is used for data encoding. This technique is mainly used to learn the representation of data that can be used for dimensionality reduction by training network to ignore noise. Autoencoders play an important role in unsupervised learning and deep architectures mainly for transfer learning (Pierre. B, 2012). When autoencoders are decoded, they are simple linear circuits that transforms inputs to outputs with least distortion.

Autoencoders were first introduced in 1980’s to address the issue of back propagation without training and rather use input as a teacher (Rumelhart et al., 1986). Since then, autoencoders have taken a phase change to the form on Restricted Boltzman Machine.

Today, autoencoders are used in various applications such as predicting sentiment distributions in Natural Language Processing (NLP) (Socher et al., 2011a) (Socher et al., 2011b), feature extraction (Masci et al., 2011), anomaly detection (Sakurada et al., 2014), facial recognition (Gao et al., 2015), clustering (Dilokthanakul et al., 2016), image classification (Geng et al., 2015) and many other application.

Image: Simple auto encoder representation

In today’s tutorial, I will go over on how to use auto encoders for anomaly detection in predictive maintenance.

You will need only two libraries for this analysis.

options(warn=-1)

library(dplyr)
library(h2o)


Here we are using data from a bench press and can be downloaded from my github repo. This is an experimental data I generated in a lab for my PhD dissertation.  There are total of four different states in this machine and they are split into four different csv files. We need to load the data first. In the data time represents the time between samples, ax is the acceleration on x axis, ay is the acceleration on y axis, az is the acceleration on z axis and at is the G’s. The data was collected at sample rate of 100hz.

Four different states of the machine were collected

1. Nothing attached to drill press 

2. Wooden base attached to drill press 

3. Imbalance created by adding weight to one end of wooden base 

4. Imbalance created by adding weight to two ends of wooden base.

setwd("/home/")

time ax ay az aT
0.002 -0.3246 0.2748 0.1502 0.451
0.009 0.6020 -0.1900 -0.3227 0.709
0.019 0.9787 0.3258 0.0124 1.032
0.027 0.6141 -0.4179 0.0471 0.744
0.038 -0.3218 -0.6389 -0.4259 0.833
0.047 -0.3607 0.1332 -0.1291 0.406
We can look at the summary of each file using summary function in R. Below, we can observe that 66 seconds long data is available. We also have min, max and mean for each of the variables.
# summary of each file
summary(file2)

      time               ax                  ay                  az
Min.   :  0.004   Min.   :-1.402700   Min.   :-1.693300   Min.   :-3.18930
1st Qu.: 27.005   1st Qu.:-0.311100   1st Qu.:-0.429600   1st Qu.:-0.57337
Median : 54.142   Median : 0.015100   Median :-0.010700   Median :-0.11835
Mean   : 54.086   Mean   : 0.005385   Mean   :-0.002534   Mean   :-0.09105
3rd Qu.: 81.146   3rd Qu.: 0.314800   3rd Qu.: 0.419475   3rd Qu.: 0.34815
Max.   :108.127   Max.   : 1.771900   Max.   : 1.515600   Max.   : 5.04610
aT
Min.   :0.0360
1st Qu.:0.6270
Median :0.8670
Mean   :0.9261
3rd Qu.:1.1550
Max.   :5.2950

### Data Aggregation and feature extraction

Here, the data is aggregated by 1 minute and features are extracted. Features are extracted to reduce the size of the data and only storing the representation of the data.

file1$group = as.factor(round(file1$time))
file2$group = as.factor(round(file2$time))
file3$group = as.factor(round(file3$time))
file4$group = as.factor(round(file4$time))
#(file1,20)

#list of all files
files = list(file1, file2, file3, file4)

#loop through all files and combine
features = NULL
for (i in 1:4){
res = files[[i]] %>%
group_by(group) %>%
summarize(ax_mean = mean(ax),
ax_sd = sd(ax),
ax_min = min(ax),
ax_max = max(ax),
ax_median = median(ax),
ay_mean = mean(ay),
ay_sd = sd(ay),
ay_min = min(ay),
ay_may = max(ay),
ay_median = median(ay),
az_mean = mean(az),
az_sd = sd(az),
az_min = min(az),
az_maz = max(az),
az_median = median(az),
aT_mean = mean(aT),
aT_sd = sd(aT),
aT_min = min(aT),
aT_maT = max(aT),
aT_median = median(aT)
)
features = rbind(features, res)
}

#view all features

Create Train and Test Set

To build an anomaly detection model, a train and test set is required. Here, the normal condition of the data is used for training and remaining is used for testing.

# create train and test set
train = features[1:67,2:ncol(features)]
test = features[68:nrow(features),2:ncol(features)]


### Auto Encoders

#### Auto Encoders using H2O package

Use the h2o.init()method to initialize H2O. This method accepts the following options. Note: that in most cases, simply using h2o.init() is all that a user is required to do.

# initialize h2o cluser
h2o.init()

The R object to be converted to an H2O object should be named so that it can be used in subsequent analysis. Also note that the R object is converted to a parsed H2O data object, and will be treated as a data frame by H2O in subsequent analysis.

# convert train and test to h2o object
train_h2o = as.h2o(train)
test_h2o = as.h2o(test)

The h2o.deeplearning function fits H2O’s Deep Learning models from within R. While H2O Deep Learning has many parameters, it was designed to be just as easy to use as the other supervised training methods in H2O.

Early stopping, automatic data standardization and handling of categorical variables and missing values and adaptive learning rates (per weight) reduce the amount of parameters the user has to specify. Often, it’s just the number and sizes of hidden layers, the number of epochs and the activation function and maybe some regularization techniques.

# build auto encoder model with 3 layers
model_unsup = h2o.deeplearning(x = 2:ncol(features)
, training_frame = train_h2o
, model_id = "Test01"
, autoencoder = TRUE
, reproducible = TRUE
, ignore_const_cols = FALSE
, seed = 42
, hidden = c(50,10,50,100,100)
, epochs = 100
, activation ="Tanh")

# view the model
model_unsup
Model Details:
==============

H2OAutoEncoderModel: deeplearning
Model ID:  Test01
Status of Neuron Layers: auto-encoder, gaussian distribution, Quadratic loss, 19,179 weights/biases, 236.0 KB, 2,546 training samples, mini-batch size 1
layer units  type dropout       l1       l2 mean_rate rate_rms momentum
1     1    19 Input  0.00 %       NA       NA        NA       NA       NA
2     2    50  Tanh  0.00 % 0.000000 0.000000  0.029104 0.007101 0.000000
3     3    10  Tanh  0.00 % 0.000000 0.000000  0.021010 0.006320 0.000000
4     4    50  Tanh  0.00 % 0.000000 0.000000  0.024570 0.006848 0.000000
5     5   100  Tanh  0.00 % 0.000000 0.000000  0.052482 0.018357 0.000000
6     6   100  Tanh  0.00 % 0.000000 0.000000  0.052677 0.021417 0.000000
7     7    19  Tanh      NA 0.000000 0.000000  0.025557 0.009494 0.000000
mean_weight weight_rms mean_bias bias_rms
1          NA         NA        NA       NA
2    0.000069   0.180678  0.001542 0.017311
3    0.000008   0.187546 -0.000435 0.011542
4    0.011644   0.184633  0.000371 0.006443
5    0.000063   0.113350 -0.000964 0.008983
6    0.000581   0.100150  0.001003 0.013848
7   -0.001349   0.121616  0.006549 0.012720

H2OAutoEncoderMetrics: deeplearning
** Reported on training data. **

Training Set Metrics:
=====================

MSE: (Extract with h2o.mse) 0.005829827
RMSE: (Extract with h2o.rmse) 0.0763533



Detect anomalies in an H2O data set using an H2O deep learning model with auto-encoding trained previously.

# now we need to calculate MSE or anomaly score
anmlt = h2o.anomaly(model_unsup
, train_h2o
, per_feature = FALSE) %>% as.data.frame()
# create a label for healthy data
anmlt$y = 0 # view top data head(anmlt)  Reconstruction.MSE y 0.001953387 0 0.004875430 0 0.002195593 0 0.006722837 0 0.001670331 0 0.005859846 0 Calculate the threshold value for trainanomaly scores. Various methods can be used such as calculating the quantiles, max, median, min etc. It all depends on the use case. Here we will use quantile with probability of 99.9%. # calculate thresholds from train data threshold = quantile(anmlt$Reconstruction.MSE, probs = 0.999)


Now, we have anomaly score for train and its thresholds, we can predict the new anomaly scores for test data and plot it to see how it differs from train data.

# calculate anomaly scores for test data
test_anmlt = h2o.anomaly(model_unsup
, test_h2o
, per_feature = FALSE) %>% as.data.frame()

# create a label for healthy data
test_anmlt$y = 1  # combine the train and test anomaly scores for visulaizatio results = data.frame(rbind(anmlt,test_anmlt), threshold) head(results)  Reconstruction.MSE y threshold 0.001953387 0 0.01705935 0.004875430 0 0.01705935 0.002195593 0 0.01705935 0.006722837 0 0.01705935 0.001670331 0 0.01705935 0.005859846 0 0.01705935 The results are plotted below. The x axis is the observations and y axis is the anomaly score. The green points are the trained data and red are test data. We can note that all the data that was trained except one lied below the anomaly limit. Its also interesting to note the increasing trend pattern for the anomaly scores for other state of the machine. # Adjust plot sizes options(repr.plot.width = 15, repr.plot.height = 6) plot(results$Reconstruction.MSE, type = 'n', xlab='observations', ylab='Reconstruction.MSE', main = "Anomaly Detection Results")
points(results$Reconstruction.MSE, pch=19, col=ifelse(results$Reconstruction.MSE < threshold, "green", "red"))
abline(h=threshold, col='red', lwd=2)


### Conclusion

Auto encoder is a very powerful tool and very fun to play with. They have been used in image analysis, image reconstruction and image colorization. In this tutorial you have seen how to perform anomaly detection on a simple signal data and few lines of code. The possibilities of using this are many. Let me know what you think about auto encoders in the comments below.

### Session info

Below is the session info for the the packages and their versions used in this analysis.

sessionInfo()

R version 3.3.3 (2017-03-06)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 9 (stretch)

locale:
[1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8
[4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8
[10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] h2o_3.26.0.2 dplyr_0.8.3

loaded via a namespace (and not attached):
[1] Rcpp_1.0.2          magrittr_1.5        tidyselect_0.2.5
[4] uuid_0.1-2          R6_2.4.0            rlang_0.4.0
[7] tools_3.3.3         htmltools_0.3.6     assertthat_0.2.1
[10] digest_0.6.20       tibble_2.1.3        crayon_1.3.4
[13] IRdisplay_0.7.0     purrr_0.3.2         repr_1.0.1
[16] base64enc_0.1-3     vctrs_0.2.0         bitops_1.0-6
[19] RCurl_1.95-4.12     IRkernel_1.0.2.9000 zeallot_0.1.0
[22] glue_1.3.1          evaluate_0.14       pbdZMQ_0.3-3
[25] pillar_1.4.2        backports_1.1.4     jsonlite_1.6
[28] pkgconfig_2.0.2

The post Auto Encoders for Anomaly Detection in Predictive Maintenance appeared first on Hi! I am Nagdev.