# Date Wrangling with dplyr and lubridate

**R-SquareD**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

When I’m working with financial time series data, I usually use xts objects. I convert the data to a data.frame when I’m ready to plot. Sometimes, I find that I want to adjust the periodicity after I’ve converted the data to a data.frame and dplyr doesn’t have a built in To*Period* function to handle this.

There are many ways to do this, but here is a simple method I created using lubridate.

dt | x | y | z |
---|---|---|---|

2016-09-30 | 0.5929255 | 0.9471731 | 0.3849695 |

2016-10-31 | 0.4608359 | 0.4818123 | 0.3505995 |

2016-11-30 | 0.0489204 | 0.1026600 | 0.5048000 |

2016-12-31 | 0.4354312 | 0.7474725 | 0.1319946 |

2017-01-31 | 0.1969235 | 0.9362995 | 0.0475031 |

2017-02-28 | 0.2869447 | 0.3290996 | 0.9325527 |

2017-03-31 | 0.7202285 | 0.1793812 | 0.4118351 |

2017-04-30 | 0.8789776 | 0.4084741 | 0.2664088 |

2017-05-31 | 0.7721041 | 0.0749422 | 0.2096817 |

2017-06-30 | 0.4888230 | 0.8979686 | 0.0206246 |

2017-07-26 | 0.9185885 | 0.4004092 | 0.5670242 |

This is a very simple method to create periodicity transformation using dplyr and lubridate. The first function creates a new column that distinctly identifies each month/year that each record belongs to using **ceiling_date**. This could be any interval of time from second to hours to quarters, to years. I can also prefix the unit with an integer to create custom intervals (“5 days”).

Next, I **group_by** the new column so that it can perform a subset function upon each month. The **top_n** function will return the largest value within a grouping for the field passed (date). If I wanted to return the first observation within each month, simply change the **n** parameter to **-1** (**top_n(n=-1, date)**).

The final two functions are just to clean up the data by ungrouping it and removing the Monthly column.

That is all nice, but if I’m going to be doing this more than once, I would rather have a function to handle it more generically.

dt | x | y | z |
---|---|---|---|

2016-09-30 | 0.5929255 | 0.9471731 | 0.3849695 |

2016-10-01 | 0.5546463 | 0.5019259 | 0.2752113 |

2017-01-01 | 0.8042691 | 0.8392850 | 0.9429900 |

2017-04-01 | 0.8989569 | 0.0962568 | 0.2398289 |

2017-07-01 | 0.1345319 | 0.9558426 | 0.6833553 |

This function is obviously more complicated than the script above, but the **field** parameter adds some challenges. I want my data wrangling functions to follow the dplyr convention of passing expressions as field names. This keeps everything consistent. I use the **deparse** function to convert the expression to a string so that I can access the data. I could use lazyeval for this (which is how dplyr works) but it creates many more complications and unatractive code. There are limitations to deparse but they do not impact this function (see https://cran.r-project.org/web/packages/lazyeval/vignettes/lazyeval.html).

It would be useful to pass a aggregation function to this function to aggregate by the period (mean, median, max, etc.). I will create this function in the future, for now I use xts to handle this.

**leave a comment**for the author, please follow the link and comment on their blog:

**R-SquareD**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.