Site icon R-bloggers

Here is the new padr

[This article was first published on That’s so Random, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I am very happy to announce v0.3.0 of the padr package, which was introduced in January. As requested by many, you are now able to use intervals of which the unit is different from 1. In earlier version the eight interval values only allowed for a single unit (e.g. year, day, hour). Now you can use any time period that is accepted by seq.Date or seq.POSIXt (e.g. 2 months, 6 hours, 5 minutes) in both thicken and pad. get_interval does test for both the interval and the unit of the interval of the datetime variable from now on.

library(padr)
library(tidyverse)
get_interval(as.Date(c("2017-01-01", "2017-03-01")))
## [1] "2 month"

With this new definition of the interval the possibilities of both thicken and pad are expanded. See the following analysis, where the new functionality is demonstrated by aggregating to daypart:

emergency %>% 
  filter(title == "EMS: CARDIAC EMERGENCY") %>% 
  thicken(interval = "6 hours", colname = "daypart") %>% 
  count(daypart) %>% 
  pad() %>% 
  fill_by_value(n) %>% 
  mutate(start_daypart = lubridate::hour(daypart) %>% as.factor()) %>% 
  ggplot(aes(n)) +
  geom_density(aes(fill = start_daypart)) +
  facet_wrap(~start_daypart)
## pad applied on the interval: 6 hour

The addition of unit specification to the interval made it unfortunately impossible to make v0.3.0 fully backwords compatible. The two main functions are affected in the the following way.

example_df <- data.frame(dt = as.Date(c("2017-01-01", "2017-03-01", "2017-07-01")),
                         y = 1:3)
pad(example_df)
## pad applied on the interval: 2 month
##           dt  y
## 1 2017-01-01  1
## 2 2017-03-01  2
## 3 2017-05-01 NA
## 4 2017-07-01  3

One should thus be a little more careful that there is no higher unit within the interval that explains the data as well. To reduce the risk of padding at the wrong unit, pad now always prints the interval at which the padding occured.

Reimplementation of pad

The second significant change in this version is the reimplementatiion of pad. Performance was poor when pad was applied on more than a handfull of groups. By leveraging dplyr this is now greatly improved.

Besides, functionality is slightly adjusted as well:

x <- emergency %>% thicken("day", "d") %>% count(title, d)
x %>% pad(group = "title")
x %>% group_by(title) %>% pad()

Other changes

Bug fixes

To leave a comment for the author, please follow the link and comment on their blog: That’s so Random.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.