PowerQuery Puzzle solved with R

[This article was first published on Numbers around us - Medium, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

# 141–142


PQ_141: content file
PQ_142: content file

Last time I noted that usually PQ Puzzles are focused on times and dates. And… it confirms this week. Lets count it.


Rolling averages and other rolling calculations are very useful concepts because it can show us main trend cleaned from lower time granulation anomalies. In R we can play with comparison of dates to dates with lagged measurements. It is not hard but can complex task. Every complex task has much bigger chance for mistakes. Thats why I use dedicated package for out today puzzle. We have measurements for months and all we need is to calculate mean average from last 3 and 5 months.

Load libraries and data


input = read_excel("Power Query/PQ_Challenge_141.xlsx", range = "A1:C35")
test  = read_excel("Power Query/PQ_Challenge_141.xlsx", range = "E1:I35")


result = input %>%
  group_by(Month) %>%
    `3 Year MV` = slide_dbl(Defects, mean,  
                            .after = -1, 
                            .before = 3, 
                            .complete = TRUE) %>% round(0),
    `5 Year MV` = slide_dbl(Defects, mean, 
                            .after = -1, 
                            .before = 5, 
                            .complete = TRUE) %>% round(0)
  ) %>% 


identical(result, test)
#> [1] TRUE


In this puzzle we calculate something that HR called “head count”, which means number of people hired or working in certain period of time. In our task we have to create head count for each quarter of hour from 9AM to 9PM, having start and end time for 3 people. It can be tricky in some ways but I will explain it later.

Load data and libraries


input = read_excel("Power Query/PQ_Challenge_142.xlsx", range = "A1:C4") 
  %>% janitor::clean_names()
test = read_excel("Power Query/PQ_Challenge_142.xlsx", range = "E1:F49")


input <- input %>%
  mutate(interval = interval(ymd_hms(start_time), ymd_hms(end_time)))

quarter_table <- tibble(
  interval = interval(
    seq(ymd_hms("1899-12-31 09:00:00"), 
        ymd_hms("1899-12-31 20:45:00"), 
        by = "15 mins"),
    seq(ymd_hms("1899-12-31 09:14:59"), 
        ymd_hms("1899-12-31 20:59:59"), 
        by = "15 mins")
head_count <- quarter_table %>%
    Count = map_dbl(interval, ~sum(int_overlaps(.x, input$interval))),
    Time = paste(format(int_start(interval), "%I:%M:%S %p"), 
                 format(int_end(interval), "%I:%M:%S %p"), 
                 sep = " - ")
  ) %>%
  select(Time, Count)


You can validate it by your eyes, because there is one issue. If we want to have disjoint time intervals we need to finish previous one second before the next started. But if we do it that way we have problem with person who ends its job at 19:00:00 because it would look like working only one second in next interval. That is why I cannot validate it with given results, but except one line values are exactly the same.

Thanks for your engagement, and let me know if you have any comments. Stay tuned, because this Thursday article about memoization will be online.

PowerQuery Puzzle solved with R was originally published in Numbers around us on Medium, where people are continuing the conversation by highlighting and responding to this story.

To leave a comment for the author, please follow the link and comment on their blog: Numbers around us - Medium.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)