# Working with Date and Time in R

**Rsquared Academy Blog**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

## Introduction

In this post, we will learn to work with date/time data in R using lubridate, an R package that makes it easy to work with dates and time.

Let us begin by installing and loading the pacakge.

## Libraries, Code & Data

We will use the following packages:

The data sets can be downloaded from here and the codes from here.

library(lubridate) library(dplyr) library(magrittr) library(readr)

## Quick Intro

#### Origin

Let us look at the origin for the numbering system used for date and time

calculations in R.

origin

## [1] "1970-01-01 UTC"

#### Current Date/Time

Next, let us check out the current date, time and whether it occurs in the am

or pm. `now()`

returns the date time as well as the time zone whereas `today()`

will return only the current date. `am()`

and `pm()`

return `TRUE`

or `FALSE`

.

now()

## [1] "2019-01-29 14:26:58 IST"

today()

## [1] "2019-01-29"

am(now())

## [1] FALSE

pm(now())

## [1] TRUE

## Case Study

### Data

transact <- read_csv('https://raw.githubusercontent.com/rsquaredacademy/datasets/master/transact.csv')

## # A tibble: 2,466 x 3 ## Invoice Due Payment ## <date> <date> <date> ## 1 2013-01-02 2013-02-01 2013-01-15 ## 2 2013-01-26 2013-02-25 2013-03-03 ## 3 2013-07-03 2013-08-02 2013-07-08 ## 4 2013-02-10 2013-03-12 2013-03-17 ## 5 2012-10-25 2012-11-24 2012-11-28 ## 6 2012-01-27 2012-02-26 2012-02-22 ## 7 2013-08-13 2013-09-12 2013-09-09 ## 8 2012-12-16 2013-01-15 2013-01-12 ## 9 2012-05-14 2012-06-13 2012-07-01 ## 10 2013-07-01 2013-07-31 2013-07-26 ## # ... with 2,456 more rows

### Data Dictionary

The data set has 3 columns. All the dates are in the format (yyyy-mm-dd).

- Invoice: invoice date
- Due: due date
- Payment: payment date

We will use the functions in the lubridate package to answer a few

questions we have about the transact data.

- extract date, month and year from Due
- compute the number of days to settle invoice
- compute days over due
- check if due year is a leap year
- check when due day in february is 29, whether it is a leap year
- how many invoices were settled within due date
- how many invoices are due in each quarter
- what is the average duration between invoice date and payment date

## Extract Date, Month & Year from Due Date

The first thing we will learn is to extract the date, month and year.

this_day <- as_date('2017-03-23') day(this_day)

## [1] 23

month(this_day)

## [1] 3

year(this_day)

## [1] 2017

Let us now extract the date, month and year from the `Due`

column.

transact %>% mutate( due_day = day(Due), due_month = month(Due), due_year = year(Due) )

## # A tibble: 2,466 x 6 ## Invoice Due Payment due_day due_month due_year ## <date> <date> <date> <int> <dbl> <dbl> ## 1 2013-01-02 2013-02-01 2013-01-15 1 2 2013 ## 2 2013-01-26 2013-02-25 2013-03-03 25 2 2013 ## 3 2013-07-03 2013-08-02 2013-07-08 2 8 2013 ## 4 2013-02-10 2013-03-12 2013-03-17 12 3 2013 ## 5 2012-10-25 2012-11-24 2012-11-28 24 11 2012 ## 6 2012-01-27 2012-02-26 2012-02-22 26 2 2012 ## 7 2013-08-13 2013-09-12 2013-09-09 12 9 2013 ## 8 2012-12-16 2013-01-15 2013-01-12 15 1 2013 ## 9 2012-05-14 2012-06-13 2012-07-01 13 6 2012 ## 10 2013-07-01 2013-07-31 2013-07-26 31 7 2013 ## # ... with 2,456 more rows

## Compute days to settle invoice

Time to do some arithmetic with the dates. Let us calculate the duration of a

course by subtracting the course start date from the course end date.

course_start <- as_date('2017-04-12') course_end <- as_date('2017-04-21') course_duration <- course_end - course_start course_duration ## Time difference of 9 days

Let us estimate the number of days to settle the invoice by subtracting the

date of invoice from the date of payment.

transact %>% mutate( days_to_pay = Payment - Invoice )

## # A tibble: 2,466 x 4 ## Invoice Due Payment days_to_pay ## <date> <date> <date> <time> ## 1 2013-01-02 2013-02-01 2013-01-15 13 days ## 2 2013-01-26 2013-02-25 2013-03-03 36 days ## 3 2013-07-03 2013-08-02 2013-07-08 5 days ## 4 2013-02-10 2013-03-12 2013-03-17 35 days ## 5 2012-10-25 2012-11-24 2012-11-28 34 days ## 6 2012-01-27 2012-02-26 2012-02-22 26 days ## 7 2013-08-13 2013-09-12 2013-09-09 27 days ## 8 2012-12-16 2013-01-15 2013-01-12 27 days ## 9 2012-05-14 2012-06-13 2012-07-01 48 days ## 10 2013-07-01 2013-07-31 2013-07-26 25 days ## # ... with 2,456 more rows

## Compute days over due

How many of the invoices were settled post the due date? We can find this by:

- subtracting the due date from the payment date
- counting the number of rows where delay < 0

transact %>% mutate( delay = Due - Payment ) %>% filter(delay < 0) %>% tally()

## # A tibble: 1 x 1 ## n ## <int> ## 1 877

## Is due year a leap year?

Just for fun, let us check if the due year happens to be a leap year.

transact %>% mutate( is_leap = leap_year(Due) )

## # A tibble: 2,466 x 4 ## Invoice Due Payment is_leap ## <date> <date> <date> <lgl> ## 1 2013-01-02 2013-02-01 2013-01-15 FALSE ## 2 2013-01-26 2013-02-25 2013-03-03 FALSE ## 3 2013-07-03 2013-08-02 2013-07-08 FALSE ## 4 2013-02-10 2013-03-12 2013-03-17 FALSE ## 5 2012-10-25 2012-11-24 2012-11-28 TRUE ## 6 2012-01-27 2012-02-26 2012-02-22 TRUE ## 7 2013-08-13 2013-09-12 2013-09-09 FALSE ## 8 2012-12-16 2013-01-15 2013-01-12 FALSE ## 9 2012-05-14 2012-06-13 2012-07-01 TRUE ## 10 2013-07-01 2013-07-31 2013-07-26 FALSE ## # ... with 2,456 more rows

## If due day is February 29, is it a leap year?

Let us do some data sanitization. If the due day happens to be February 29,

let us ensure that the due year is a leap year. Below are the steps to check

if the due year is a leap year:

- we will extract the following from the due date:
- day
- month
- year

- we will then create a new column
`is_leap`

which will have be set to`TRUE`

if

the year is a leap year else it will be set to`FALSE`

- filter all the payments due on 29th Feb
- select the following columns:
`Due`

`is_leap`

transact %>% mutate( due_day = day(Due), due_month = month(Due), due_year = year(Due), is_leap = leap_year(due_year) ) %>% filter(due_month == 2 & due_day == 29) %>% select(Due, is_leap)

## # A tibble: 4 x 2 ## Due is_leap ## <date> <lgl> ## 1 2012-02-29 TRUE ## 2 2012-02-29 TRUE ## 3 2012-02-29 TRUE ## 4 2012-02-29 TRUE

## Shift Date

Time to shift some dates. We can shift a date by days, weeks or months. Let us

shift the course start date by:

- 2 days
- 3 weeks
- 1 year

course_start + days(2) ## [1] "2017-04-14" course_start + weeks(3) ## [1] "2017-05-03" course_start + years(1) ## [1] "2018-04-12"

## Interval

Let us calculate the duration of the course using `interval`

. If you observe

carefully, the result is not the duration in days but an object of class

`interval`

. Now let us learn how we can use intervals.

interval(course_start, course_end)

## [1] 2017-04-12 UTC--2017-04-21 UTC

## Intervals Overlap

Let us say you are planning a vacation and want to check if the vacation

dates overlap with the course dates. You can do this by:

- creating vacation and course intervals
- use
`int_overlaps()`

to check if two intervals overlap. It returns`TRUE`

if the intervals overlap else`FALSE`

.

Let us use the vacation start and end dates to create `vacation_interval`

and then check if it overlaps with `course_interval`

.

vacation_start <- as_date('2017-04-19') vacation_end <- as_date('2017-04-25') course_interval <- interval(course_start, course_end) vacation_interval <- interval(vacation_start, vacation_end) int_overlaps(course_interval, vacation_interval) ## [1] TRUE

## How many invoices were settled within due date?

Let us use intervals to count the number of invoices that were settled within

the due date. To do this, we will:

- create an interval for the invoice and due date
- create a new column
`due_next`

by incrementing the due date by 1 day - another interval for
`due_next`

and the payment date - if the intervals overlap, the payment was made within the due date

transact %>% mutate( inv_due_interval = interval(Invoice, Due), due_next = Due + days(1), due_pay_interval = interval(due_next, Payment), overlaps = int_overlaps(inv_due_interval, due_pay_interval) ) %>% select(Invoice, Due, Payment, overlaps)

## # A tibble: 2,466 x 4 ## Invoice Due Payment overlaps ## <date> <date> <date> <lgl> ## 1 2013-01-02 2013-02-01 2013-01-15 TRUE ## 2 2013-01-26 2013-02-25 2013-03-03 FALSE ## 3 2013-07-03 2013-08-02 2013-07-08 TRUE ## 4 2013-02-10 2013-03-12 2013-03-17 FALSE ## 5 2012-10-25 2012-11-24 2012-11-28 FALSE ## 6 2012-01-27 2012-02-26 2012-02-22 TRUE ## 7 2013-08-13 2013-09-12 2013-09-09 TRUE ## 8 2012-12-16 2013-01-15 2013-01-12 TRUE ## 9 2012-05-14 2012-06-13 2012-07-01 FALSE ## 10 2013-07-01 2013-07-31 2013-07-26 TRUE ## # ... with 2,456 more rows

Below we show another method to count the number of invoices paid within the

due date. Instead of using `days`

to change the due date, we use `int_shift`

to shift it by 1 day.

transact %>% mutate( inv_due_interval = interval(Invoice, Due), due_pay_interval = interval(Due, Payment), due_pay_next = int_shift(due_pay_interval, by = days(1)), overlaps = int_overlaps(inv_due_interval, due_pay_next) ) %>% select(Invoice, Due, Payment, overlaps)

## # A tibble: 2,466 x 4 ## Invoice Due Payment overlaps ## <date> <date> <date> <lgl> ## 1 2013-01-02 2013-02-01 2013-01-15 TRUE ## 2 2013-01-26 2013-02-25 2013-03-03 FALSE ## 3 2013-07-03 2013-08-02 2013-07-08 TRUE ## 4 2013-02-10 2013-03-12 2013-03-17 FALSE ## 5 2012-10-25 2012-11-24 2012-11-28 FALSE ## 6 2012-01-27 2012-02-26 2012-02-22 TRUE ## 7 2013-08-13 2013-09-12 2013-09-09 TRUE ## 8 2012-12-16 2013-01-15 2013-01-12 TRUE ## 9 2012-05-14 2012-06-13 2012-07-01 FALSE ## 10 2013-07-01 2013-07-31 2013-07-26 TRUE ## # ... with 2,456 more rows

You might be thinking why we incremented the due date by a day before creating

the interval between the due day and the payment day. If we do not increment,

both the intervals will share a common date i.e. the due date and they will

always overlap as shown below:

transact %>% mutate( inv_due_interval = interval(Invoice, Due), due_pay_interval = interval(Due, Payment), overlaps = int_overlaps(inv_due_interval, due_pay_interval) ) %>% select(Invoice, Due, Payment, overlaps)

## # A tibble: 2,466 x 4 ## Invoice Due Payment overlaps ## <date> <date> <date> <lgl> ## 1 2013-01-02 2013-02-01 2013-01-15 TRUE ## 2 2013-01-26 2013-02-25 2013-03-03 TRUE ## 3 2013-07-03 2013-08-02 2013-07-08 TRUE ## 4 2013-02-10 2013-03-12 2013-03-17 TRUE ## 5 2012-10-25 2012-11-24 2012-11-28 TRUE ## 6 2012-01-27 2012-02-26 2012-02-22 TRUE ## 7 2013-08-13 2013-09-12 2013-09-09 TRUE ## 8 2012-12-16 2013-01-15 2013-01-12 TRUE ## 9 2012-05-14 2012-06-13 2012-07-01 TRUE ## 10 2013-07-01 2013-07-31 2013-07-26 TRUE ## # ... with 2,456 more rows

## Shift Interval

Intervals can be shifted too. In the below example, we shift the course

interval by:

- 1 day
- 3 weeks
- 1 year

course_interval <- interval(course_start, course_end) int_shift(course_interval, by = days(1)) ## [1] 2017-04-13 UTC--2017-04-22 UTC int_shift(course_interval, by = weeks(3)) ## [1] 2017-05-03 UTC--2017-05-12 UTC int_shift(course_interval, by = years(1)) ## [1] 2018-04-12 UTC--2018-04-21 UTC

## Within

Let us assume that we have to attend a conference in April 2017. Does it occur

during the course duration? We can answer this using `%within%`

which will

return `TRUE`

if a date falls within an interval.

conference <- as_date('2017-04-15') conference %within% course_interval ## [1] TRUE

#### How many invoices were settled within due date?

Let us use `%within%`

to count the number of invoices that were settled within

the due date. We will do this by:

- creating an interval for the invoice and due date
- check if the payment date falls within the above interval

transact %>% mutate( inv_due_interval = interval(Invoice, Due), overlaps = Payment %within% inv_due_interval ) %>% select(Due, Payment, overlaps)

## # A tibble: 2,466 x 3 ## Due Payment overlaps ## <date> <date> <lgl> ## 1 2013-02-01 2013-01-15 TRUE ## 2 2013-02-25 2013-03-03 FALSE ## 3 2013-08-02 2013-07-08 TRUE ## 4 2013-03-12 2013-03-17 FALSE ## 5 2012-11-24 2012-11-28 FALSE ## 6 2012-02-26 2012-02-22 TRUE ## 7 2013-09-12 2013-09-09 TRUE ## 8 2013-01-15 2013-01-12 TRUE ## 9 2012-06-13 2012-07-01 FALSE ## 10 2013-07-31 2013-07-26 TRUE ## # ... with 2,456 more rows

## Quarter

Let us check the quarter and the semester in which the course starts.

course_start ## [1] "2017-04-12" quarter(course_start) ## [1] 2 quarter(course_start, with_year = TRUE) ## [1] 2017.2 semester(course_start) ## [1] 1

Let us count the invoices due for each quarter.

transact %>% mutate( quarter_due = quarter(Due) ) %>% count(quarter_due)

## # A tibble: 4 x 2 ## quarter_due n ## <int> <int> ## 1 1 521 ## 2 2 661 ## 3 3 618 ## 4 4 666

transact %>% mutate( Quarter = quarter(Due, with_year = TRUE) )

## # A tibble: 2,466 x 4 ## Invoice Due Payment Quarter ## <date> <date> <date> <dbl> ## 1 2013-01-02 2013-02-01 2013-01-15 2013. ## 2 2013-01-26 2013-02-25 2013-03-03 2013. ## 3 2013-07-03 2013-08-02 2013-07-08 2013. ## 4 2013-02-10 2013-03-12 2013-03-17 2013. ## 5 2012-10-25 2012-11-24 2012-11-28 2012. ## 6 2012-01-27 2012-02-26 2012-02-22 2012. ## 7 2013-08-13 2013-09-12 2013-09-09 2013. ## 8 2012-12-16 2013-01-15 2013-01-12 2013. ## 9 2012-05-14 2012-06-13 2012-07-01 2012. ## 10 2013-07-01 2013-07-31 2013-07-26 2013. ## # ... with 2,456 more rows

#### Case Study

Let us also get the course interval in different units.

course_interval / dseconds() ## [1] 777600 course_interval / dminutes() ## [1] 12960 course_interval / dhours() ## [1] 216 course_interval / dweeks() ## [1] 1.285714 course_interval / dyears() ## [1] 0.02465753

We can use `time_length()`

to get the course interval in different units.

time_length(course_interval, unit = "seconds") ## [1] 777600 time_length(course_interval, unit = "minutes") ## [1] 12960 time_length(course_interval, unit = "hours") ## [1] 216

`as.period()`

is another way to get the course interval in different units.

as.period(course_interval, unit = "seconds") ## [1] "777600S" as.period(course_interval, unit = "minutes") ## [1] "12960M 0S" as.period(course_interval, unit = "hours") ## [1] "216H 0M 0S"

**leave a comment**for the author, please follow the link and comment on their blog:

**Rsquared Academy Blog**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.