Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I have different sets of monthly data that I want to align and evaluate once a month. The different sources report the timestamp of the monthly data differently – one reports the date without the day, another as the last business day, and another as the last day of the month. For what I want to do, I’m content to align the data to the last day of the month. How do I do that in R?

The solution for this isn’t obvious in Excel or OpenOffice, either, but I thought it would be at least similarly simple in R. I’d looked around, and the best solution I’d found on r-help was something of a kludge:

index(x) = as.POSIXct(as.Date(as.yearmon(index(x)),frac=1), tz="UTC")


I said it wasn’t pretty, didn’t I? You are reading that right – convert the index first to yearmon, then to Date (with frac representing the fraction of a month with a number between 1 and 0), then finally to POSIXct.

That’s worked fine for me until I recently found strange, eight-second misalignments between my data sets that seemed to be caused by the transformation above. Timezone issues, perhaps? On Jeff Ryan’s advice to keep all my date transformations in POSIX, I finally found a better solution in r-help that had been posted by Whit Armstrong years and years ago. I’m not sure why I didn’t find it earlier, but his solution was somewhat more general than what I have been looking for. I’ve stripped his answer down a bit to the following utility function:

eom <- function(date) {
# date character string containing POSIXct date
date.lt <- as.POSIXlt(date) # add a month, then subtract a day:
mon <- date.lt$mon + 2 year <- date.lt$year
year <- year + as.integer(mon==13) # if month was December add a year
mon[mon==13] <- 1
iso = ISOdate(1900+year, mon, 1, hour=0, tz=attr(date,"tz"))
result = as.POSIXct(iso) - 86400 # subtract one day
result + (as.POSIXlt(iso)$isdst - as.POSIXlt(result)$isdst)*3600
}


This wasn’t as simple as I was hoping for, but once functionalized it becomes simple. The premise is the same as the spreadsheet solution – add a month then subtract a day.

# Whit's example
x <- seq(as.POSIXct("2001-01-10"),as.POSIXct("2005-12-10"),by="months")
data.frame(before=x,after=eom(x))
before      after
1  2001-01-10 2001-01-31
2  2001-02-10 2001-02-28
3  2001-03-10 2001-03-31
4  2001-04-10 2001-04-30
5  2001-05-10 2001-05-31
6  2001-06-10 2001-06-30
... snip ...


Ah, much better. Thanks, Whit!