It is good to be explicit

August 6, 2012
By

(This article was first published on Pitfalls-R-Us, and kindly contributed to R-bloggers)

Being careful not to repeat the year 1901 mistake, I set the TZ variable before I run R. I have the same set of data that I convert as follows:


dates <- c("11/11/1900", "01/01/1901", "30/05/1901", "01/01/1902")
values <- c( 1, 2, 0.7, 0.1 )

date1 <- as.Date( dates, "%d/%m/%Y" )
date2 <- as.POSIXct( dates, "%d/%m/%Y" )

and then plot


plot( date1, values )
plot( date2, values )

To my surprise I end up with the following two graphs.

A number of things conspired against me here:

  • positional parameters,
  • default parameters.

Well the main cause is that I did not read the manual and assumed the parameters for as.Date() are similar to those of as.POSIXct(). But ask yourself how many times you used a function without consulting the manual because you thought you knew what the parameters were.

So lets look at the other causes. The help for as.Date() shows the following possible parameters.


as.Date(x, ...)
## S3 method for class 'character'
as.Date(x, format = "", ...)
## S3 method for class 'numeric'
as.Date(x, origin, ...)
## S3 method for class 'POSIXct'
as.Date(x, tz = "UTC", ...)

Depending on the type of object you are trying to convert different parameter lists apply. Lets focus on character objects.


as.Date(x, format = "", ...)

Two parameters are expected:

  • x an object to convert,
  • format a format string that specifies how the dates are formatted.
You can provide values for these parameters positional, that is in the order they are listen, or by name.

An example of the former.


as.Date('1492-11-29', '%Y-%m-%d')
[1] "1492-11-29"

Notice that format also has a default value, "". Which means we do not have to provide it. This indeed works.


as.Date('1492-11-29')
[1] "1492-11-29"

Well sort of. If format equals "" R tries %Y-%m-%d and %Y/%m/%d. It warns when this does not succeed


as.Date('29-Nov-1492')
Error in charToDate(x) :
character string is not in a standard unambiguous format

But fails without warning for


as.Date('29-11-1492')
[1] "29-11-14"

29-11-1492 is interpreted as the 29th year, 11-th month and 14-th day. The remaining string "92" not used, but this is not reported.

Default parameters can save time but it is better to be explicit and say what you mean and specify the format of your data, so R does not have to guess, and you won't end up being surprised.

Back to how to provide the values of the parameters. We have seen the positional method, the other one is by name. This would be:


as.Date(x='1492-11-29', format='%Y-%m-%d')
[1] "1492-11-29"

It even works the other way around now.


as.Date(format='%Y/%m/%d', x='1492/11/29')
[1] "1492-11-29"

It is more work to type this, and probably not worth it when you are just using R interactively. But if you are writing a script that is to be reused, this is the best way. It is very explicit, in a good way. You tell R exactly what you mean. In addition you tell your future self what you meant when you wrote it. It help others to understand what you are trying to say. This is good for reproducible research.

Now what went wrong in the original plot? My mistake was to assume that the parameters for as.POSIXct() appear in the same order as the ones for as.Date(). This is not the case however, as can be seen from the help pages.


## S3 method for class 'character'
as.Date(x, format = "", ...)

## S3 method for class 'character'
as.POSIXlt(x, tz = "", format, ...)

Therefore the conversion


date2 <- as.POSIXct( dates, "%d/%m/%Y" )

ended up using "%d/%m/%Y" as the tzparameter. It did not find a value for the formatparameter and therefore guessed one (%Y-%m-%d). They day numbers got interpreted as years, and the graph therefore shows the years, 1, 11 and 30. About 2000 years ago instead of the original 100 years given by the data.

I would have avoided the surprising graph had I used named parameters, even without reading the manual.

Conclusion

Be explicit and be beware of implicit defaults in R.

To leave a comment for the author, please follow the link and comment on his blog: Pitfalls-R-Us.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.