chronos: A fast general purpose datetime parser

[This article was first published on schochastics - all things R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This post introduces the R package chronos, a fast general purpose date/time converter written in Rust with crates dateparser and chrono. This is the second outcome of my adventure of learning Rust.1

The package essentially does what anytime does, but it appears to do so a bit more efficiently.

Installation

You can install the development version of chronos like so:

remotes::install_github("schochastics/chronos")
#or
pak::pak("schochastics/chronos")
library(chronos)

Formats

chronos understands many different date(time) formats out of the box. A subset is included as a small benchmark dataset.

bench_date
 [1] "1511648546"                          "1620021848429"                      
 [3] "1620024872717915000"                 "2021-05-01T01:17:02.604456Z"        
 [5] "2017-11-25T22:34:50Z"                "Wed, 02 Jun 2021 06:31:39 GMT"      
 [7] "2019-11-29 08:08-08"                 "2019-11-29 08:08:05-08"             
 [9] "2021-05-02 23:31:36.0741-07"         "2021-05-02 23:31:39.12689-07"       
[11] "2019-11-29 08:15:47.624504-08"       "2017-07-19 03:21:51+00:00"          
[13] "2014-04-26 05:24:37 PM"              "2021-04-30 21:14"                   
[15] "2021-04-30 21:14:10"                 "2021-04-30 21:14:10.052282"         
[17] "2014-04-26 17:24:37.123"             "2014-04-26 17:24:37.3186369"        
[19] "2012-08-03 18:31:59.257000000"       "2017-11-25 13:31:15 PST"            
[21] "2017-11-25 13:31 PST"                "2014-12-16 06:20:00 UTC"            
[23] "2014-12-16 06:20:00 GMT"             "2014-04-26 13:13:43 +0800"          
[25] "2014-04-26 13:13:44 +09:00"          "2012-08-03 18:31:59.257000000 +0000"
[27] "2015-09-30 18:48:56.35272715 UTC"    "2021-02-21"                         
[29] "2021-02-21 PST"                      "2021-02-21 UTC"                     
[31] "2020-07-20+08:00"                    "01:06:06"                           
[33] "4:00pm"                              "6:00 AM"                            
[35] "01:06:06 PST"                        "4:00pm PST"                         
[37] "6:00 AM PST"                         "6:00pm UTC"                         
[39] "May 6 at 9:24 PM"                    "May 27 02:45:27"                    
[41] "May 8, 2009 5:57:51 PM"              "September 17, 2012 10:09am"         
[43] "September 17, 2012, 10:10:09"        "May 02, 2021 15:51:31 UTC"          
[45] "May 02, 2021 15:51 UTC"              "May 26, 2021, 12:49 AM PDT"         
[47] "September 17, 2012 at 10:09am PST"   "2021-Feb-21"                        
[49] "May 25, 2021"                        "oct 7, 1970"                        
[51] "oct 7, 70"                           "oct. 7, 1970"                       
[53] "oct. 7, 70"                          "October 7, 1970"                    
[55] "12 Feb 2006, 19:17"                  "12 Feb 2006 19:17"                  
[57] "14 May 2019 19:11:40.164"            "7 oct 70"                           
[59] "7 oct 1970"                          "03 February 2013"                   
[61] "1 July 2013"                         "4/8/2014 22:05"                     
[63] "04/08/2014 22:05"                    "4/8/14 22:05"                       
[65] "04/2/2014 03:00:51"                  "8/8/1965 12:00:00 AM"               
[67] "8/8/1965 01:00:01 PM"                "8/8/1965 01:00 PM"                  
[69] "8/8/1965 1:00 PM"                    "8/8/1965 12:00 AM"                  
[71] "4/02/2014 03:00:51"                  "03/19/2012 10:11:59"                
[73] "03/19/2012 10:11:59.3186369"         "3/31/2014"                          
[75] "03/31/2014"                          "08/21/71"                           
[77] "8/1/71"                              "2014/4/8 22:05"                     
[79] "2014/04/08 22:05"                    "2014/04/2 03:00:51"                 
[81] "2014/4/02 03:00:51"                  "2012/03/19 10:11:59"                
[83] "2012/03/19 10:11:59.3186369"         "2014/3/31"                          
[85] "2014/03/31"                          "3.31.2014"                          
[87] "03.31.2014"                          "08.21.71"                           
[89] "2014.03.30"                          "2014.03"                            
[91] "171113 14:14:20"                     "2014年04月08日11时25分18秒"         
[93] "2014年04月08日"                     

chronos() is the powerhouse of the package and tries as hard as possible to parse every input into either a date or a datetime, depending on out_format. The function can also return a raw character vector which can be fed into faster converters, such as fasttime.

chronos(bench_date, out_format = "datetime")
 [1] "2017-11-25 23:22:26 CET"  "2021-05-03 08:04:08 CEST"
 [3] "2021-05-03 08:54:32 CEST" "2021-05-01 03:17:02 CEST"
 [5] "2017-11-25 23:34:50 CET"  "2021-06-02 08:31:39 CEST"
 [7] "2019-11-29 17:08:00 CET"  "2019-11-29 17:08:05 CET" 
 [9] "2021-05-03 08:31:36 CEST" "2021-05-03 08:31:39 CEST"
[11] "2019-11-29 17:15:47 CET"  "2017-07-19 05:21:51 CEST"
[13] "2014-04-26 17:24:37 CEST" "2021-04-30 21:14:00 CEST"
[15] "2021-04-30 21:14:10 CEST" "2021-04-30 21:14:10 CEST"
[17] "2014-04-26 17:24:37 CEST" "2014-04-26 17:24:37 CEST"
[19] "2012-08-03 18:31:59 CEST" "2017-11-25 22:31:15 CET" 
[21] "2017-11-25 22:31:00 CET"  "2014-12-16 07:20:00 CET" 
[23] "2014-12-16 07:20:00 CET"  "2014-04-26 07:13:43 CEST"
[25] "2014-04-26 06:13:44 CEST" "2012-08-03 20:31:59 CEST"
[27] "2015-09-30 20:48:56 CEST" "2021-02-21 19:51:31 CET" 
[29] "2021-02-21 19:51:31 CET"  "2021-02-21 19:51:31 CET" 
[31] "2020-07-19 20:51:31 CEST" "2024-02-27 01:06:06 CET" 
[33] "2024-02-27 16:00:00 CET"  "2024-02-27 06:00:00 CET" 
[35] "2024-02-27 10:06:06 CET"  "2024-02-28 01:00:00 CET" 
[37] "2024-02-27 15:00:00 CET"  "2024-02-27 19:00:00 CET" 
[39] "2024-05-06 21:24:00 CEST" "2024-05-27 02:45:27 CEST"
[41] "2009-05-08 17:57:51 CEST" "2012-09-17 10:09:00 CEST"
[43] "2012-09-17 10:10:09 CEST" "2021-05-02 17:51:31 CEST"
[45] "2021-05-02 17:51:00 CEST" "2021-05-26 09:49:00 CEST"
[47] "2012-09-17 20:09:00 CEST" "2021-02-21 19:51:31 CET" 
[49] "2021-05-25 19:51:31 CEST" "1970-10-07 19:51:31 CET" 
[51] "1970-10-07 19:51:31 CET"  "1970-10-07 19:51:31 CET" 
[53] "1970-10-07 19:51:31 CET"  "1970-10-07 19:51:31 CET" 
[55] "2006-02-12 19:17:00 CET"  "2006-02-12 19:17:00 CET" 
[57] "2019-05-14 19:11:40 CEST" "1970-10-07 19:51:31 CET" 
[59] "1970-10-07 19:51:31 CET"  "2013-02-03 19:51:31 CET" 
[61] "2013-07-01 19:51:31 CEST" "2014-04-08 22:05:00 CEST"
[63] "2014-04-08 22:05:00 CEST" "2014-04-08 22:05:00 CEST"
[65] "2014-04-02 03:00:51 CEST" "1965-08-08 00:00:00 CET" 
[67] "1965-08-08 13:00:01 CET"  "1965-08-08 13:00:00 CET" 
[69] "1965-08-08 13:00:00 CET"  "1965-08-08 00:00:00 CET" 
[71] "2014-04-02 03:00:51 CEST" "2012-03-19 10:11:59 CET" 
[73] "2012-03-19 10:11:59 CET"  "2014-03-31 19:51:31 CEST"
[75] "2014-03-31 19:51:31 CEST" "1971-08-21 19:51:31 CET" 
[77] "1971-08-01 19:51:31 CET"  "2014-04-08 22:05:00 CEST"
[79] "2014-04-08 22:05:00 CEST" "2014-04-02 03:00:51 CEST"
[81] "2014-04-02 03:00:51 CEST" "2012-03-19 10:11:59 CET" 
[83] "2012-03-19 10:11:59 CET"  "2014-03-31 19:51:31 CEST"
[85] "2014-03-31 19:51:31 CEST" "2014-03-31 19:51:31 CEST"
[87] "2014-03-31 19:51:31 CEST" "1971-08-21 19:51:31 CET" 
[89] "2014-03-30 19:51:31 CEST" "2014-03-27 19:51:31 CET" 
[91] "2017-11-13 14:14:20 CET"  "2014-04-08 11:25:18 CEST"
[93] "2014-04-08 19:51:31 CEST"

Functions

Under the hood chronos() calls three functions which can also be used in isolation:

  • parse_datetime(): a fast datetime parser that tries several different formats until it can parse the input

  • parse_date(): a fast date parser that tries several different formats until it can parse the input

  • parse_epoch(): a fast epoch timestamp parser

anytime

library(anytime)

anytime is certainly the most accepted general purpose date(time) converter to date.

It does not recognize all accepted formats of chronos out of the box. However, the unrecognized formats can easily be added via anytime::addFormats().

dplyr::coalesce(
    anytime(bench_date),
    anydate(bench_date)
)
 [1] NA                         "1620-02-17 23:53:28 LMT" 
 [3] NA                         "2021-05-01 01:17:02 CEST"
 [5] "2017-11-25 22:34:50 CET"  "2021-06-02 06:31:39 CEST"
 [7] "2019-11-29 08:08:08 CET"  "2019-11-29 08:08:05 CET" 
 [9] "2021-05-02 23:31:36 CEST" "2021-05-02 23:31:39 CEST"
[11] "2019-11-29 08:15:47 CET"  "2017-07-19 03:21:51 CEST"
[13] "2014-04-26 05:24:37 CEST" "2021-04-30 21:14:00 CEST"
[15] "2021-04-30 21:14:10 CEST" "2021-04-30 21:14:10 CEST"
[17] "2014-04-26 17:24:37 CEST" "2014-04-26 17:24:37 CEST"
[19] "2012-08-03 18:31:59 CEST" "2017-11-25 13:31:15 CET" 
[21] "2017-11-25 00:00:00 CET"  "2014-12-16 06:20:00 CET" 
[23] "2014-12-16 06:20:00 CET"  "2014-04-26 13:13:43 CEST"
[25] "2014-04-26 13:13:44 CEST" "2012-08-03 18:31:59 CEST"
[27] "2015-09-30 18:48:56 CEST" "2021-02-21 00:00:00 CET" 
[29] "2021-02-21 00:00:00 CET"  "2021-02-21 00:00:00 CET" 
[31] "2020-07-20 08:00:00 CEST" NA                        
[33] NA                         NA                        
[35] NA                         NA                        
[37] NA                         NA                        
[39] NA                         NA                        
[41] "2009-05-08 00:00:00 CEST" "2012-09-17 00:00:00 CEST"
[43] "2012-09-17 00:00:00 CEST" "2021-05-02 00:00:00 CEST"
[45] "2021-05-02 00:00:00 CEST" "2021-05-26 00:00:00 CEST"
[47] "2012-09-17 00:00:00 CEST" "2021-02-21 00:00:00 CET" 
[49] "2021-05-25 00:00:00 CEST" "1970-10-07 00:00:00 CET" 
[51] NA                         "1970-10-07 00:00:00 CET" 
[53] NA                         "1970-10-07 00:00:00 CET" 
[55] "2006-02-12 00:00:00 CET"  "2006-02-12 19:17:00 CET" 
[57] "2019-05-14 19:11:40 CEST" NA                        
[59] "1970-10-07 00:00:00 CET"  "2013-02-03 00:00:00 CET" 
[61] "2013-07-01 00:00:00 CEST" "2014-04-08 22:05:00 CEST"
[63] "2014-04-08 22:05:00 CEST" NA                        
[65] "2014-04-02 03:00:51 CEST" "1965-08-08 00:00:00 CET" 
[67] "1965-08-08 00:00:00 CET"  "1965-08-08 00:00:00 CET" 
[69] "1965-08-08 00:00:00 CET"  "1965-08-08 00:00:00 CET" 
[71] "2014-04-02 03:00:51 CEST" "2012-03-19 10:11:59 CET" 
[73] "2012-03-19 10:11:59 CET"  "2014-03-31 00:00:00 CEST"
[75] "2014-03-31 00:00:00 CEST" NA                        
[77] NA                         "2014-04-08 22:05:00 CEST"
[79] "2014-04-08 22:05:00 CEST" "2014-04-02 03:00:51 CEST"
[81] "2014-04-02 03:00:51 CEST" "2012-03-19 10:11:59 CET" 
[83] "2012-03-19 10:11:59 CET"  "2014-03-31 00:00:00 CEST"
[85] "2014-03-31 00:00:00 CEST" "2014-03-31 00:00:00 CEST"
[87] "2014-03-31 00:00:00 CEST" NA                        
[89] "2014-03-30 00:00:00 CET"  "2014-03-01 00:00:00 CET" 
[91] "1711-03-14 14:13:28 LMT"  NA                        
[93] NA                        

The full list of formats supported can be retrieved with anytime::getFormats(). chronos implements all these formats natively too.

Benchmark

The benchmark is done with three datasets that contain a variety of different date(time) formats.

bench_datetimes <- readLines("datetime1000.txt")
head(bench_datetimes)
[1] "28 December 1979 12:54AM" "12/21/1991 08:07 AM"     
[3] "13-03-1979 19:51"         "2007-11-08 01:09:25"     
[5] "May 29, 1978 07:57"       "2015-05-04 21:55:07"     
bench_epochs <- readLines("epoch500.txt")
head(bench_epochs)
[1] "717700128" "115153946" "948771719" "586380132" "795097964" "211051179"
bench_dates <- readLines("dates500.txt")
head(bench_dates)
[1] "September 07, 2018"         "1991.02.14"                
[3] "12:00 AM December 26, 2000" "April 05, 1996"            
[5] "June 19, 2014"              "27-Jun-2016"               
bench <- c(bench_datetimes, bench_epochs, bench_dates)

Ability to parse

This benchmark just checks if something was parsed and does not say if the result is actually correct.

sum_na <- function(x) sum(is.na(x))
data.frame(
    type = c("datetimes", "epochs", "dates", "all"),
    chronos = c(
        sum_na(chronos(bench_datetimes)),
        sum_na(chronos(bench_epochs)),
        sum_na(chronos(bench_dates, out_format = "date")),
        sum_na(chronos(bench))
    ),
    anytime = c(
        sum_na(anytime(bench_datetimes)),
        sum_na(anytime(as.numeric(bench_epochs))),
        sum_na(anydate(bench_dates)),
        sum_na(anytime(bench))
    )
)
       type chronos anytime
1 datetimes       0     322
2    epochs       0       0
3     dates       0     138
4       all       0     949

When epoch times are encoded as characters (which happens when all data is put together in one vector), then anytime fails to parse most of the epoch times.

Runtime

The package fasttime can be used together with chronos to convert larger sets of datetimes by letting chronos return a character vector which is then parsed by fastPOSIXct or fastDate.

fast_chronos <- function(x, out_format = "datetime") {
    res <- chronos(x, out_format = "character")
    if (out_format == "datetime") {
        return(fasttime::fastPOSIXct(res))
    } else {
        return(fasttime::fastDate(res))
    }
}

Full data

mb <- microbenchmark::microbenchmark(
    chronos = chronos(bench),
    fast_chronos = fast_chronos(bench),
    anytime = anytime(bench),
    times = 100L
)
ggplot2::autoplot(mb)

datetime

mb <- microbenchmark::microbenchmark(
    chronos = chronos(bench_datetimes),
    fast_chronos = fast_chronos(bench_datetimes),
    anytime = anytime(bench_datetimes),
    times = 100L
)
ggplot2::autoplot(mb)

epoch

bench_epochs_num <- as.integer(bench_epochs)
mb <- microbenchmark::microbenchmark(
    chronos = chronos(bench_epochs_num),
    fast_chronos = fast_chronos(bench_epochs_num),
    anytime = anytime(bench_epochs_num),
    posix = as.POSIXct(bench_epochs_num),
    fastposix = fasttime::fastPOSIXct(bench_epochs_num),
    times = 100L
)
ggplot2::autoplot(mb)

When the input vector only consists of epoch timestamps, it is best to parse them directly with as.POSIXct.

date

mb <- microbenchmark::microbenchmark(
    chronos = chronos(bench_date, out_format = "date"),
    fast_chronos = fast_chronos(bench_date, out_format = "date"),
    anytime = anydate(bench_date),
    times = 100L
)
ggplot2::autoplot(mb)

Disclaimer

While it might seem that chronos has an edge over anytime, it is far less battle tested and mature (Date parsing can be as tricky as URL parsing). I am grateful for anyone who can take the package for a spin and report issues/make feature requests.

Footnotes

  1. I am now feeling more comfortable with the language and I am starting to really enjoy it. Pretty sure this will not be my last R package with Rust.↩︎

Reuse

Citation

BibTeX citation:
@online{schoch2024,
  author = {Schoch, David},
  title = {Chronos: {A} Fast General Purpose Datetime Parser},
  date = {2024-02-27},
  url = {http://blog.schochastics.net/posts/2024-02-27_chronos-fast-general-purpose-datetime-converter},
  langid = {en}
}
For attribution, please cite this work as:
Schoch, David. 2024. “Chronos: A Fast General Purpose Datetime Parser.” February 27, 2024. http://blog.schochastics.net/posts/2024-02-27_chronos-fast-general-purpose-datetime-converter.
To leave a comment for the author, please follow the link and comment on their blog: schochastics - all things R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)