The magic of the year 1901

[This article was first published on Pitfalls-R-Us, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The year 1901 is rather magical. Well it is for R provided you run it under Linux. Let me show you why. I have four data points, one from 1900, two from 1901, and one from 1902.

dates  <- c("11/11/1900", "01/01/1901", "30/05/1901", "01/01/1902")
values <- c(      1,         2,           0.7,              0.1 )

I convert them in two different ways; as a Date, and as a POSIXct. For both conversions the same format string is used.

date1 <- as.Date(    dates, format="%d/%m/%Y" )
date2 <- as.POSIXct( dates, format="%d/%m/%Y" )

I plot them like this

plot( date1, values )
plot( date2, values )

Now you try and spot the difference.

Both graphs have the same shape, but different breaks. In the first graph the maximum appears to be in 1901, in the second graph in 1900. This is caused by a bug in the conversion from a string to R POSIXct class.

> as.POSIXct( "1901-01-01", format="%Y-%m-%d" ) 
[1] "1900-12-31 23:59:28 AMT"

Two things are wrong here.

  1. We somehow shifted 32 seconds into the past, (thereby moving from 1901 to 1900, which causes the difference in the two graphs).
  2. We also moved to the CET time zone, where I live, to the Amazonian time zone (AMT).
The conversion works fine for dates in more recent past.

> as.POSIXct("2012-01-01", format="%Y-%m-%d" )
[1] "2012-01-01 CET"

It even works properly for dates before the Unix epoch 1970-01-01.

> as.POSIXct("1957-01-01", format="%Y-%m-%d" )
[1] "1957-01-01 CET"

But around 1940 strange things happen to the timezone, and in december 1901 the 32 second time shift happens.

> as.POSIXct("1940-01-01", format="%Y-%m-%d" )
[1] "1940-01-01 NET"

How to fix this

Be explicit, don't leave R guessing what time zone to use. Set the environment variable TZ to a time zone of your liking before you start R.

$ export TZ=CET
$ R
> as.POSIXct( "1901-01-01", format="%Y-%m-%d" ) 
[1] "1901-01-01 CET"
> as.POSIXct( "1940-01-01", format="%Y-%m-%d" ) 
[1] "1940-01-01 CET"

To leave a comment for the author, please follow the link and comment on their blog: Pitfalls-R-Us.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)