# Predicting Car Battery Failure With R And H2O – Study

May 24, 2019
By

``# Loading librariessuppressWarnings( suppressMessages( library( h2o ) ) ) suppressWarnings( suppressMessages( library( data.table ) ) )suppressWarnings( suppressMessages( library( plotly ) ) )suppressWarnings( suppressMessages( library( DT ) ) )# Reading data file# Data from: https://www.kaggle.com/yunlevin/levin-vehicle-telematicsdataFileName = "/Development/Analytics/AnomalyDetection/AutomovileFailurePrediction/v2.csv"carData = fread( dataFileName, skip=0, header = TRUE )carBatteryData = data.table( TimeStamp = carData\$timeStamp                             , BatteryVoltage = as.numeric( carData\$battery )                             )rm(carData)# Data cleaning, filtering and conversioncarBatteryData = na.omit( carBatteryData ) # Keeping just valid Values# According to this article: # https://shop.advanceautoparts.com/r/advice/car-maintenance/car-battery-voltage-range## A perfect voltage ( without any devices or electronic systems plugged in )  # is between 13.7 and 14.7V. # If the battery isn’t fully charged, it will diminish to 12.4V at 75%, # 12V when it’s only operating at 25%, and up to 11.9V when it’s completely discharged. ## Battery voltage while a load is connected is much slower# it should be something between 9.5V and 10.5V ## This value interval ensures that your battery can store and deliver enough # current to start your car and power all your electronics and electric devices # without any difficultycarBatteryData = carBatteryData[BatteryVoltage >= 9.5] # Filtering voltages greater or equal to 9.5carBatteryData\$TimeStamp = as.POSIXct( paste0( substr(carBatteryData\$TimeStamp,1,17),"00" ) )carBatteryData = unique(carBatteryData) # Removing duplicate voltage readingscarBatteryData = carBatteryData[order(TimeStamp)]# spliting all data, using the last date as testing data and the rest for training.lastDate = max( as.Date( format( carBatteryData\$TimeStamp, "%Y-%m-%d" ) ) )trainingData = carBatteryData[ as.Date( format( carBatteryData\$TimeStamp, "%Y-%m-%d" ) ) != lastDate ]testingData = carBatteryData[ as.Date( format( carBatteryData\$TimeStamp, "%Y-%m-%d" ) ) == lastDate ]################################################################################# Creating Anomaly Detection Model################################################################################  h2o.init( nthreads = -1, max_mem_size = "5G" )``
``  h2o.no_progress() # Disable progress bars for Rmd  h2o.removeAll() # Cleans h2o cluster state.``
``  # Convert the training dataset to H2O format.  trainingData_hex = as.h2o( trainingData[,2], destination_frame = "train_hex" )    # Build an Isolation forest model  trainingModel = h2o.isolationForest( training_frame = trainingData_hex                                       , sample_rate = 0.1                                       , max_depth = 32                                       , ntrees = 100                                      )    # According to H2O doc:   # http://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/if.html  #  # Isolation Forest is similar in principle to Random Forest and is built on the basis of decision trees.     # Isolation Forest creates multiple decision trees to isolate observations.  #   # Trees are split randomly, The assumption is that:  #     #   IF ONE UNIT MEASUREMENTS ARE SIMILAR TO OTHERS,  #   IT WILL TAKE MORE RANDOM SPLITS TO ISOLATE IT.  #   #   The less splits needed, the unit is more likely to be anomalous.  #   # The average number of splits is then used as a score.  # Calculate score for training dataset  score <- h2o.predict( trainingModel, trainingData_hex )  result_pred <- as.vector( score\$predict )################################################################################# Setting threshold value for anomaly detection.################################################################################  # Setting desired threshold percentage.  threshold = .995 # Let's say we have 99.5% voltage values correct    # Using avobe threshold to get score limit to filter anomalous voltage readings.  scoreLimit = round( quantile( result_pred, threshold ), 4 )    ################################################################################# Get anomalous voltage readings from testing data, using model and scoreLimit got using training data.################################################################################  # Convert testing data frame to H2O format.  testingDataH2O = as.h2o( testingData[,2], destination_frame = "testingData_hex" )    # Get score using training model  testingScore <- h2o.predict( trainingModel, testingDataH2O )  # Add row score at the beginning of testing dataset  testingData = cbind( RowScore = round( as.vector( testingScore\$predict ), 4 ), testingData )  # Check if there are anomalous voltage readings from testing data  anomalies = testingData[ testingData\$RowScore > scoreLimit, ]``
``````# Here there is and additional filter to ensure maintenance recommendation  # If there are more than 3 anomalous voltage readings, display an alert.  if( dim( anomalies )[1]  > 3 ) {     cat( "Show alert on car display: Battery got anomalous voltage readings, it is recommended to take it to service." )        plot_ly( data = anomalies             , x = ~TimeStamp             , y = ~BatteryVoltage             , type = 'scatter'             , mode = "lines"             , name = 'Anomalies') %>%      layout( yaxis = list( title = 'Battery Voltage.' )              , xaxis = list( categoryorder='trace', title = 'Date - Time.' )               )  }
## Show alert on car display: Battery got anomalous voltage readings, it is recommended to take it to service.
```

`if( dim( anomalies )[1]  > 3 ) {   DT::datatable(anomalies[,c(2,3)], rownames = FALSE )}`
`Show 102550100 entriesSearch:TimeStampBatteryVoltage2018-01-31T14:15:00Z10.1752018-01-31T15:29:00Z14.882018-01-31T15:29:00Z14.922018-01-31T15:32:00Z10.382018-01-31T20:38:00Z10.122018-02-01T00:50:00Z10.432018-02-01T01:02:00Z9.727Showing 1 to 7 of 7 entriesPrevious1Next`
`Using this approach we may prevent failures on cars, not only for batteries but for many cases when sensors are used.`
`Carlos Kassabhttps://www.linkedin.com/in/carlos-kassab-48b40743/We are using R, more information about R:https://www.r-bloggers.com`

