PowerQuery Puzzle solved with R

[This article was first published on Numbers around us - Medium, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

# 133–134

This weekend brought us two stories, two puzzles which have to things in common: they are pretty hard and difficulties were both linked to time dimension.

Puzzles:

PQ_133: content file
PQ_134: content file

PQ_133

In first puzzle we have data about electricity consumption measured in some points of time during two days across three meters. It shouldn’t be hard right? Yes, it could. Especially when one of the measurements crosses midnight. Let see what was difficult there.

Load data and libraries

library(tidyverse)
library(readxl)
library(lubridate)

input = read_excel("PQ_Challenge_133.xlsx", range = "A1:E9", col_types = c("date", "date", "numeric", "numeric", "numeric"))
test = read_excel("PQ_Challenge_133.xlsx", range = "G1:J3")

There is one issue with reading time column, and that is why I had to add “many” years to it as first data manipulation.

Prepare functions and mid-steps

df = input %>%
  mutate(Time = Time + years(124) - days(ifelse(day(Date) == 1, 60, 59))) %>%
  select(-Date) %>%
  pivot_longer(-Time, names_to = "Meter", values_to = "Value") %>%
  group_by(Meter) %>%
  arrange(Meter, Time) %>%
  mutate(end = lead(Time),
         next_reading = lead(Value), 
         cross_midnight = ifelse(day(Time) != day(end), 1, 0), 
         duration = end-Time, 
         consumption = next_reading - Value ,
         time_to_midnight = difftime( ceiling_date(Time, "day"),Time, units = "hours"))
         )

In this step I had some things to clean data and make auxiliary columns:
– time of next reading;
– flag if measurement is crossing midnight;
– consumptions based on two consecutive measurements;
– and time left to midnight.

But there is where we need to think little bit more. First of all we need to split data for crossing and not crossing midnight. Only for crossing ones we need to calculate proportion of time which happens before and after midnight, then somehow split this proportion into two time periods: before and after.

df_cross = df %>%
  ungroup() %>%
  filter(cross_midnight == 1) %>%
  mutate(perc_of_dur = time_to_midnight/as.numeric(duration),
         cons_before_midnight = consumption * perc_of_dur,
         cons_after_midnight = consumption - cons_before_midnight) %>%
  select(Time, end, Meter, cons_before_midnight, cons_after_midnight) %>%
  pivot_longer(-c(Time, end, Meter), names_to = "Midnight", values_to = "Consumption") %>%
  mutate(Date_start = as.Date(Time),
         Date_end = as.Date(end), 
         Date = if_else(Midnight == "cons_before_midnight", Date_start, Date_end),
         Consumption = as.numeric(Consumption)) %>%
  select(Date, Meter, Consumption) 

Second part need just simple reshaping before bringing all data back together.

df_else = df %>%
  ungroup() %>%
  filter(cross_midnight == 0) %>%
  select(Time, end, Meter, consumption) %>%
  pivot_longer(-c(Time, end, Meter), names_to = "Midnight", values_to = "Consumption") %>%
  mutate(Date = as.Date(Time)) %>%
  select(Date, Meter, Consumption)

And then we can put it back in one data.frame and reshape it to be compared with provided solution. And it is.

result = bind_rows(df_cross, df_else) %>%
  arrange(Date, Meter) %>%
  group_by(Date, Meter) %>%
  summarize(Consumption = round(sum(Consumption))) %>%
  pivot_wider(names_from = Meter, values_from = Consumption) %>%
  ungroup() %>%
  mutate(Date = as.POSIXct(Date))

identical(result, test)
# [1] TRUE

PQ_134

Second puzzle was about stocks. We are provided with two tables: one with opening stock for first day in three stores, and one that has Ins and Outs from the stock for three consecutive days. What we need to find are closing stocks for each of those three days. I was trying with recursive calculations at first, but… I found out that there is another way. Lets find out!

Load data and libraries

library(tidyverse)
library(readxl)

input1 = read_excel("PQ_Challenge_134.xlsx", range = "A1:D12")
input2 = read_excel("PQ_Challenge_134.xlsx", range = "F1:G4")

test = read_excel("PQ_Challenge_134.xlsx", range = "J1:N12")

input2 is about opening stocks, but we don’t really know for what date. But we can 100% sure to assume that we need first day from second table.

opening = input2 %>% 
  mutate(Date = min(input1$Date)) %>%
  rename(open = `Open Stock`)

There is where magic begins… we can little bit transform opening df to have exactly the same structure like the second one. We have Date, open and store . We need IN and OUT. Let make it by changing open into IN and create empty OUT column. And then we can bind both tables together. Finally grouping data by Store and Date we can use cummulative sums to find daily stock moves and closing_stock. Finally we can compare it to provided solution.

result <- opening %>%
  rename(IN = open, OUT = integer(length(open))) %>%
  bind_rows(input1)  %>%
  group_by(Store) %>%
  arrange(Store, Date) %>%
  mutate(OUT = ifelse(is.na(OUT), 0, OUT),
         closing_stock = cumsum(IN) - cumsum(OUT)) %>%
  ungroup() %>%
  filter(OUT != 0) %>%
  select(Store, Date, IN, OUT, `Closing Stock` = closing_stock)

identical(result, test)
#> [1] TRUE

Second puzzle was easier, but also tricky to find good and suprisingly easy solution.

We’ve seen some nice puzzles considering time changes and it shows little bit out of the box thinking. Feel free to ask, like and share.
Lets be in touch for next article about the functions.


PowerQuery Puzzle solved with R was originally published in Numbers around us on Medium, where people are continuing the conversation by highlighting and responding to this story.

To leave a comment for the author, please follow the link and comment on their blog: Numbers around us - Medium.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)