Predicting Car Battery Failure With R And H2O – Study

May 24, 2019
By

[This article was first published on R-Analytics, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

# Loading libraries
suppressWarnings( suppressMessages( library( h2o ) ) )
suppressWarnings( suppressMessages( library( data.table ) ) )
suppressWarnings( suppressMessages( library( plotly ) ) )
suppressWarnings( suppressMessages( library( DT ) ) )

# Reading data file
# Data from: https://www.kaggle.com/yunlevin/levin-vehicle-telematics
dataFileName = "/Development/Analytics/AnomalyDetection/AutomovileFailurePrediction/v2.csv"
carData = fread( dataFileName, skip=0, header = TRUE )
carBatteryData = data.table( TimeStamp = carData$timeStamp
, BatteryVoltage = as.numeric( carData$battery )
)
rm(carData)

# Data cleaning, filtering and conversion
carBatteryData = na.omit( carBatteryData ) # Keeping just valid Values

# According to this article:
# https://shop.advanceautoparts.com/r/advice/car-maintenance/car-battery-voltage-range
#
# A perfect voltage ( without any devices or electronic systems plugged in )
# is between 13.7 and 14.7V.
# If the battery isn’t fully charged, it will diminish to 12.4V at 75%,
# 12V when it’s only operating at 25%, and up to 11.9V when it’s completely discharged.
#
# Battery voltage while a load is connected is much slower
# it should be something between 9.5V and 10.5V
#
# This value interval ensures that your battery can store and deliver enough
# current to start your car and power all your electronics and electric devices
# without any difficulty

carBatteryData = carBatteryData[BatteryVoltage >= 9.5] # Filtering voltages greater or equal to 9.5
carBatteryData$TimeStamp = as.POSIXct( paste0( substr(carBatteryData$TimeStamp,1,17),"00" ) )
carBatteryData = unique(carBatteryData) # Removing duplicate voltage readings
carBatteryData = carBatteryData[order(TimeStamp)]


# spliting all data, using the last date as testing data and the rest for training.
lastDate = max( as.Date( format( carBatteryData$TimeStamp, "%Y-%m-%d" ) ) )
trainingData = carBatteryData[ as.Date( format( carBatteryData$TimeStamp, "%Y-%m-%d" ) ) != lastDate ]
testingData = carBatteryData[ as.Date( format( carBatteryData$TimeStamp, "%Y-%m-%d" ) ) == lastDate ]



################################################################################
# Creating Anomaly Detection Model
################################################################################

h2o.init( nthreads = -1, max_mem_size = "5G" )
## 
## H2O is not running yet, starting it now...
##
## Note: In case of errors look at the following log files:
## C:\Users\LaranIkal\AppData\Local\Temp\Rtmp6lTw4H/h2o_LaranIkal_started_from_r.out
## C:\Users\LaranIkal\AppData\Local\Temp\Rtmp6lTw4H/h2o_LaranIkal_started_from_r.err
##
##
## Starting H2O JVM and connecting: Connection successful!
##
## R is connected to the H2O cluster:
## H2O cluster uptime: 1 seconds 899 milliseconds
## H2O cluster timezone: America/Mexico_City
## H2O data parsing timezone: UTC
## H2O cluster version: 3.24.0.2
## H2O cluster version age: 1 month and 7 days
## H2O cluster name: H2O_started_from_R_LaranIkal_tzd452
## H2O cluster total nodes: 1
## H2O cluster total memory: 4.44 GB
## H2O cluster total cores: 8
## H2O cluster allowed cores: 8
## H2O cluster healthy: TRUE
## H2O Connection ip: localhost
## H2O Connection port: 54321
## H2O Connection proxy: NA
## H2O Internal Security: FALSE
## H2O API Extensions: Amazon S3, Algos, AutoML, Core V3, Core V4
## R Version: R version 3.6.0 (2019-04-26)
  h2o.no_progress() # Disable progress bars for Rmd
h2o.removeAll() # Cleans h2o cluster state.
## [1] 0
  # Convert the training dataset to H2O format.
trainingData_hex = as.h2o( trainingData[,2], destination_frame = "train_hex" )

# Build an Isolation forest model
trainingModel = h2o.isolationForest( training_frame = trainingData_hex
, sample_rate = 0.1
, max_depth = 32
, ntrees = 100
)

# According to H2O doc:
# http://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/if.html
#
# Isolation Forest is similar in principle to Random Forest and is built on the basis of decision trees.

# Isolation Forest creates multiple decision trees to isolate observations.
#
# Trees are split randomly, The assumption is that:
#
# IF ONE UNIT MEASUREMENTS ARE SIMILAR TO OTHERS,
# IT WILL TAKE MORE RANDOM SPLITS TO ISOLATE IT.
#
# The less splits needed, the unit is more likely to be anomalous.
#
# The average number of splits is then used as a score.

# Calculate score for training dataset
score <- h2o.predict( trainingModel, trainingData_hex )
result_pred <- as.vector( score$predict )


################################################################################
# Setting threshold value for anomaly detection.
################################################################################

# Setting desired threshold percentage.
threshold = .995 # Let's say we have 99.5% voltage values correct

# Using avobe threshold to get score limit to filter anomalous voltage readings.
scoreLimit = round( quantile( result_pred, threshold ), 4 )



################################################################################
# Get anomalous voltage readings from testing data, using model and scoreLimit got using training data.
################################################################################

# Convert testing data frame to H2O format.
testingDataH2O = as.h2o( testingData[,2], destination_frame = "testingData_hex" )

# Get score using training model
testingScore <- h2o.predict( trainingModel, testingDataH2O )

# Add row score at the beginning of testing dataset
testingData = cbind( RowScore = round( as.vector( testingScore$predict ), 4 ), testingData )

# Check if there are anomalous voltage readings from testing data
anomalies = testingData[ testingData$RowScore > scoreLimit, ]
# Here there is and additional filter to ensure maintenance recommendation
# If there are more than 3 anomalous voltage readings, display an alert.
if( dim( anomalies )[1] > 3 ) {
cat( "Show alert on car display: Battery got anomalous voltage readings, it is recommended to take it to service." )

plot_ly( data = anomalies
, x = ~TimeStamp
, y = ~BatteryVoltage
, type = 'scatter'
, mode = "lines"
, name = 'Anomalies') %>%
layout( yaxis = list( title = 'Battery Voltage.' )
, xaxis = list( categoryorder='trace', title = 'Date - Time.' )
)
}
## Show alert on car display: Battery got anomalous voltage readings, it is recommended to take it to service.

if( dim( anomalies )[1]  > 3 ) { 
DT::datatable(anomalies[,c(2,3)], rownames = FALSE )
}



Using this approach we may prevent failures on cars, not only for batteries but for many cases when sensors are used.

Carlos Kassab





We are using R, more information about R:


To leave a comment for the author, please follow the link and comment on their blog: R-Analytics.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)