Python style logging in R

[This article was first published on R – Working With Data, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This entry is part 20 of 20 in the series Using R

We are increasingly using R in “operational” settings that require robust error handling and logging. In this post we describe a quick-and-dirty way to get python style multi-level log files by wrapping the futile.logger package.

Making Messy Data Pretty

Our real world scenario involves R scripts that process raw smoke monitoring data that is updated hourly. The raw data comes from various different instruments, set up by different agencies and transmitted over at least two satellites before eventually arriving on our computers.

Data can be missing, delayed or corrupted for a variety of reasons before it gets to us. And then our R scripts perform QC based on various columns available in the raw (aka “engineering level”) data.

Ultimately, the cleaned up data gets displayed in a user interface here.

monitoring_v3_mapNo one needs to know all of the hard work that goes into cleaning and QC’ing the data that goes into this map. Our goal, after all, is to make it easier for stakeholders to make decisions without having to worry about the fussy details of raw data.

Logging When Things Go Wrong

However, neither the incoming data nor the R scripts we have written to process it are perfect. Things occasionally go wrong in a variety of ways and, when decision makers depend on the data being available, phone calls and text messages are next.

Bhuddist teacher Pema Chödrön’s book When Things Fall Apart has several advice categories that actually apply in this situation. We will focus on one:

  • Using painful emotions to cultivate wisdom, compassion, and courage
  • Communicating so as to encourage others to open up rather than shut down
  • Practices for reversing habitual patterns
  • Methods for working with chaotic situations
  • Ways for creating effective social action

One of the methods for working with chaotic situations in operational software is to have lots and Lots and LOTS of logging.

Python has the brilliant “logging” module that allows us to quickly set up separate output files for log statements at different levels:

DEBUG, INFO, WARNING, ERROR, WARNING, CRITICAL
. Inside our operational code we only need to issue single calls like:

logger.info('Ingesting data file: %s', data_file)

and this information will automatically be written out to both the DEBUG and the INFO log files if we have set them up.

In our situation it makes sense to write out three log files for quick perusal by authorized personnel:

  • ~_DEBUG.log — for programmers needing to debug what went wrong
  • ~_INFO.log — for managers trying to understand the overall situation
  • ~_ERROR.log — (hopefully zero length!) for programmers and managers

These files get written to a uniform directory every time a data processing script runs with the ‘~’ replaced by the name of the processing script.

Wrapping futile.logger

Luckily, most of the functionality we need is provided by futile.logger package (CRAN or github). Unfortunately, this package does not currently support multiple “appenders” per “logger” so we found ourselves initializing multiple named loggers to mimic the natural behavior of Python’s logging module:

# Only FATAL messages go to the console
dummy <- flog.threshold(FATAL)

# Set up new log files
if ( file.exists(DEBUG_LOG) ) dummy <- file.remove(DEBUG_LOG)
dummy <- flog.logger("debug", DEBUG, appender.file(DEBUG_LOG))

if ( file.exists(INFO_LOG) ) dummy <- file.remove(INFO_LOG)
dummy <- flog.logger("info", INFO, appender.file(INFO_LOG))

if ( file.exists(ERROR_LOG) ) dummy <- file.remove(ERROR_LOG)
dummy <- flog.logger("error", ERROR, appender.file(ERROR_LOG))

# Silence other warning messages
options(warn=-1) # -1=ignore, 0=save/print, 1=print, 2=error

Then, inside the code, we created logging and error handling blocks like this:

if ( class(result)[1] == "try-error" ) {
    err_msg <- geterrmessage()
    flog.error('Error creating SpatialPoitnsDataFrame: %s', err_msg, name='debug')
    flog.error('Error creating SpatialPointsDataFrame: %s', err_msg, name='info')
    flog.error('Error creating SpatialPointsDataFrame: %s', err_msg, name='error')
    stop(err_msg)
  }
  ...
  flog.info('Using rgdal::writeOGR to create geojson file %s', filename, name='debug')
  flog.info('Using rgdal::writeOGR to create geojson file %s', filename, name='info')

Yuck!

After writing just a few lines like this we tried a different approach and came up with a simple py_logging.R script that can be used to wrap futile.logger calls and give us the python style behavior we want. Setup is a cinch:

source('py_logging.R') # python style logging

# Set up logging
logger.setup(debugLog, infoLog, errorLog)

# Silence other warning messages
options(warn=-1) # -1=ignore, 0=save/print, 1=print, 2=error

And logging is much more natural:

if ( class(result)[1] == "try-error" ) {
    err_msg <- geterrmessage()
    logger.error('Error creating SpatialPointsDataFrame: %s', err_msg)
    stop(err_msg)
  }
  ...
  logger.info('Using rgdal::writeOGR to create goejson file %s', filename)

Output from our logger looks great!

INFO [2016-08-24 10:11:54] Running process_data.R version 0.1.5

INFO [2016-08-24 10:11:54] Working on /home/data/location/datafile_1.RData
INFO [2016-08-24 10:11:55] Found 1232 data columns
INFO [2016-08-24 10:12:03] Working on /home/data/location/datafile_2.RData
INFO [2016-08-24 10:12:03] Found 103 data columns
INFO [2016-08-24 10:12:03] Working on /home/data/location/datafile_3.RData
INFO [2016-08-24 10:12:03] Found 44 data columns

At the end of the day, we can only hope we have succeeded at one or another other of:

  • Using painful emotions to cultivate wisdom, compassion, and courage
  • Communicating so as to encourage others to open up rather than shut down

Best wishes for excellent logging!

To leave a comment for the author, please follow the link and comment on their blog: R – Working With Data.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)