The magic of the year 1901

July 21, 2012
By

(This article was first published on Pitfalls-R-Us, and kindly contributed to R-bloggers)

The year 1901 is rather magical. Well it is for R provided you run it under Linux. Let me show you why. I have four data points, one from 1900, two from 1901, and one from 1902.


dates <- c("11/11/1900", "01/01/1901", "30/05/1901", "01/01/1902")
values <- c( 1, 2, 0.7, 0.1 )

I convert them in two different ways; as a Date, and as a POSIXct. For both conversions the same format string is used.


date1 <- as.Date( dates, format="%d/%m/%Y" )
date2 <- as.POSIXct( dates, format="%d/%m/%Y" )

I plot them like this


plot( date1, values )
plot( date2, values )

Now you try and spot the difference.

Both graphs have the same shape, but different breaks. In the first graph the maximum appears to be in 1901, in the second graph in 1900. This is caused by a bug in the conversion from a string to R POSIXct class.


> as.POSIXct( "1901-01-01", format="%Y-%m-%d" )
[1] "1900-12-31 23:59:28 AMT"

Two things are wrong here.

  1. We somehow shifted 32 seconds into the past, (thereby moving from 1901 to 1900, which causes the difference in the two graphs).
  2. We also moved to the CET time zone, where I live, to the Amazonian time zone (AMT).

The conversion works fine for dates in more recent past.


> as.POSIXct("2012-01-01", format="%Y-%m-%d" )
[1] "2012-01-01 CET"

It even works properly for dates before the Unix epoch 1970-01-01.


> as.POSIXct("1957-01-01", format="%Y-%m-%d" )
[1] "1957-01-01 CET"

But around 1940 strange things happen to the timezone, and in december 1901 the 32 second time shift happens.


> as.POSIXct("1940-01-01", format="%Y-%m-%d" )
[1] "1940-01-01 NET"


How to fix this

Be explicit, don’t leave R guessing what time zone to use. Set the environment variable TZ to a time zone of your liking before you start R.


$ export TZ=CET
$ R
> as.POSIXct( "1901-01-01", format="%Y-%m-%d" )
[1] "1901-01-01 CET"
> as.POSIXct( "1940-01-01", format="%Y-%m-%d" )
[1] "1940-01-01 CET"

To leave a comment for the author, please follow the link and comment on their blog: Pitfalls-R-Us.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)