Predicting Car Battery Failure With R And H2O – Study

[This article was first published on R-Analytics, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

# Loading libraries
suppressWarnings( suppressMessages( library( h2o ) ) ) 
suppressWarnings( suppressMessages( library( data.table ) ) )
suppressWarnings( suppressMessages( library( plotly ) ) )
suppressWarnings( suppressMessages( library( DT ) ) )

# Reading data file
# Data from: https://www.kaggle.com/yunlevin/levin-vehicle-telematics
dataFileName = "/Development/Analytics/AnomalyDetection/AutomovileFailurePrediction/v2.csv"
carData = fread( dataFileName, skip=0, header = TRUE )
carBatteryData = data.table( TimeStamp = carData$timeStamp
                             , BatteryVoltage = as.numeric( carData$battery ) 
                            )
rm(carData)

# Data cleaning, filtering and conversion
carBatteryData = na.omit( carBatteryData ) # Keeping just valid Values

# According to this article: 
# https://shop.advanceautoparts.com/r/advice/car-maintenance/car-battery-voltage-range
#
# A perfect voltage ( without any devices or electronic systems plugged in )  
# is between 13.7 and 14.7V. 
# If the battery isn’t fully charged, it will diminish to 12.4V at 75%, 
# 12V when it’s only operating at 25%, and up to 11.9V when it’s completely discharged. 
#
# Battery voltage while a load is connected is much slower
# it should be something between 9.5V and 10.5V 
#
# This value interval ensures that your battery can store and deliver enough 
# current to start your car and power all your electronics and electric devices 
# without any difficulty

carBatteryData = carBatteryData[BatteryVoltage >= 9.5] # Filtering voltages greater or equal to 9.5
carBatteryData$TimeStamp = as.POSIXct( paste0( substr(carBatteryData$TimeStamp,1,17),"00" ) )
carBatteryData = unique(carBatteryData) # Removing duplicate voltage readings
carBatteryData = carBatteryData[order(TimeStamp)]


# spliting all data, using the last date as testing data and the rest for training.
lastDate = max( as.Date( format( carBatteryData$TimeStamp, "%Y-%m-%d" ) ) )
trainingData = carBatteryData[ as.Date( format( carBatteryData$TimeStamp, "%Y-%m-%d" ) ) != lastDate ]
testingData = carBatteryData[ as.Date( format( carBatteryData$TimeStamp, "%Y-%m-%d" ) ) == lastDate ]



################################################################################
# Creating Anomaly Detection Model
################################################################################

  h2o.init( nthreads = -1, max_mem_size = "5G" )
## 
## H2O is not running yet, starting it now...
## 
## Note:  In case of errors look at the following log files:
##     C:\Users\LaranIkal\AppData\Local\Temp\Rtmp6lTw4H/h2o_LaranIkal_started_from_r.out
##     C:\Users\LaranIkal\AppData\Local\Temp\Rtmp6lTw4H/h2o_LaranIkal_started_from_r.err
## 
## 
## Starting H2O JVM and connecting:  Connection successful!
## 
## R is connected to the H2O cluster: 
##     H2O cluster uptime:         1 seconds 899 milliseconds 
##     H2O cluster timezone:       America/Mexico_City 
##     H2O data parsing timezone:  UTC 
##     H2O cluster version:        3.24.0.2 
##     H2O cluster version age:    1 month and 7 days  
##     H2O cluster name:           H2O_started_from_R_LaranIkal_tzd452 
##     H2O cluster total nodes:    1 
##     H2O cluster total memory:   4.44 GB 
##     H2O cluster total cores:    8 
##     H2O cluster allowed cores:  8 
##     H2O cluster healthy:        TRUE 
##     H2O Connection ip:          localhost 
##     H2O Connection port:        54321 
##     H2O Connection proxy:       NA 
##     H2O Internal Security:      FALSE 
##     H2O API Extensions:         Amazon S3, Algos, AutoML, Core V3, Core V4 
##     R Version:                  R version 3.6.0 (2019-04-26)
  h2o.no_progress() # Disable progress bars for Rmd
  h2o.removeAll() # Cleans h2o cluster state.
## [1] 0
  # Convert the training dataset to H2O format.
  trainingData_hex = as.h2o( trainingData[,2], destination_frame = "train_hex" )
  
  # Build an Isolation forest model
  trainingModel = h2o.isolationForest( training_frame = trainingData_hex
                                       , sample_rate = 0.1
                                       , max_depth = 32
                                       , ntrees = 100
                                      )
  
  # According to H2O doc: 
  # http://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/if.html
  #
  # Isolation Forest is similar in principle to Random Forest and is built on the basis of decision trees. 
  
  # Isolation Forest creates multiple decision trees to isolate observations.
  # 
  # Trees are split randomly, The assumption is that:
  #   
  #   IF ONE UNIT MEASUREMENTS ARE SIMILAR TO OTHERS,
  #   IT WILL TAKE MORE RANDOM SPLITS TO ISOLATE IT.
  # 
  #   The less splits needed, the unit is more likely to be anomalous.
  # 
  # The average number of splits is then used as a score.

  # Calculate score for training dataset
  score <- h2o.predict( trainingModel, trainingData_hex )
  result_pred <- as.vector( score$predict )


################################################################################
# Setting threshold value for anomaly detection.
################################################################################

  # Setting desired threshold percentage.
  threshold = .995 # Let's say we have 99.5% voltage values correct
  
  # Using avobe threshold to get score limit to filter anomalous voltage readings.
  scoreLimit = round( quantile( result_pred, threshold ), 4 )
  

  
################################################################################
# Get anomalous voltage readings from testing data, using model and scoreLimit got using training data.
################################################################################

  # Convert testing data frame to H2O format.
  testingDataH2O = as.h2o( testingData[,2], destination_frame = "testingData_hex" )
  
  # Get score using training model
  testingScore <- h2o.predict( trainingModel, testingDataH2O )

  # Add row score at the beginning of testing dataset
  testingData = cbind( RowScore = round( as.vector( testingScore$predict ), 4 ), testingData )

  # Check if there are anomalous voltage readings from testing data
  anomalies = testingData[ testingData$RowScore > scoreLimit, ]
# Here there is and additional filter to ensure maintenance recommendation
  # If there are more than 3 anomalous voltage readings, display an alert.
  if( dim( anomalies )[1]  > 3 ) { 
    cat( "Show alert on car display: Battery got anomalous voltage readings, it is recommended to take it to service." )
    
    plot_ly( data = anomalies
             , x = ~TimeStamp
             , y = ~BatteryVoltage
             , type = 'scatter'
             , mode = "lines"
             , name = 'Anomalies') %>%
      layout( yaxis = list( title = 'Battery Voltage.' )
              , xaxis = list( categoryorder='trace', title = 'Date - Time.' )
               )
  }
## Show alert on car display: Battery got anomalous voltage readings, it is recommended to take it to service.

if( dim( anomalies )[1]  > 3 ) { 
  DT::datatable(anomalies[,c(2,3)], rownames = FALSE )
}



Show 102550100 entries

Search:

TimeStampBatteryVoltage

2018-01-31T14:15:00Z10.175
2018-01-31T15:29:00Z14.88
2018-01-31T15:29:00Z14.92
2018-01-31T15:32:00Z10.38
2018-01-31T20:38:00Z10.12
2018-02-01T00:50:00Z10.43
2018-02-01T01:02:00Z9.727


Showing 1 to 7 of 7 entries

Previous1Next



Using this approach we may prevent failures on cars, not only for batteries but for many cases when sensors are used.

Carlos Kassab

https://www.linkedin.com/in/carlos-kassab-48b40743/



We are using R, more information about R:

https://www.r-bloggers.com

To leave a comment for the author, please follow the link and comment on their blog: R-Analytics.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)