[This article was first published on Colin Fay, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Solving the puzzles of Advent of Code with R.

[Disclaimer] Obviously, this post contains a big spoiler about Advent of Code, as it gives solutions for solving day 1 and 2.

## Advent of Code

Advent of Code is an Advent calendar of small programming puzzles for a variety of skill sets and skill levels that can be solved in any programming language you like. About Advent of Code

## Day three, part one

The first part of the challenge for day 3 starts well: we have to read in R this file:

readLines("input3.txt", n = 6)

## [1] "#1 @ 483,830: 24x18" "#2 @ 370,498: 21x17" "#3 @ 403,823: 25x21"
## [4] "#4 @ 619,976: 20x15" "#5 @ 123,385: 15x26" "#6 @ 484,592: 11x19"

It’s a description of “spots”, claimed by the elves, inside a big square piece of fabric (supposed to be for making Santa’s clothes (remember we’re coding with Christmas theme 😉 ).

"#1 @ 483,830: 24x18"
Id / distance from left / distance from top / width / height

In other words, the “claim” with ID one is for a spot that starts at 483 inches from the left of the fabric and at 830 inches from the top. This piece is 24 inches wide and 18 inches tall.

How many square inches of fabric are within two or more claims?

Our challenge is to determine how many squares of fabric are claimed more than once. In over words, if we take all the squares from all the elves, how many square inches are claimed by more than one elf?

Let’s start by reading the file:

library(tidyverse)
df <- read.delim("input3.txt", header = FALSE) %>%
as.tibble() %>%
print() %>%
# Clean the column
mutate(V1 = str_replace_all(V1, "[#@,:x]", " "),
V1 = str_replace_all(V1, " {2,}", " "),
V1 = str_replace_all(V1, "^ ", "")) %>%
# Separate the column
separate(V1, into = c("id", "left", "top", "wide", "tall")) %>%
# Make sure everything is numeric
mutate_all(as.numeric) %>%
# Create the right and down element
mutate(
right = left + wide,
down = top + tall
)

## # A tibble: 1,375 x 1
##    V1
##    <fct>
##  1 #1 @ 483,830: 24x18
##  2 #2 @ 370,498: 21x17
##  3 #3 @ 403,823: 25x21
##  4 #4 @ 619,976: 20x15
##  5 #5 @ 123,385: 15x26
##  6 #6 @ 484,592: 11x19
##  7 #7 @ 394,960: 28x14
##  8 #8 @ 730,592: 26x20
##  9 #9 @ 975,963: 16x26
## 10 #10 @ 452,496: 18x18
## # ... with 1,365 more rows

cat("Cleaned version:\n")

## Cleaned version:

df

## # A tibble: 1,375 x 7
##       id  left   top  wide  tall right  down
##    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
##  1     1   483   830    24    18   507   848
##  2     2   370   498    21    17   391   515
##  3     3   403   823    25    21   428   844
##  4     4   619   976    20    15   639   991
##  5     5   123   385    15    26   138   411
##  6     6   484   592    11    19   495   611
##  7     7   394   960    28    14   422   974
##  8     8   730   592    26    20   756   612
##  9     9   975   963    16    26   991   989
## 10    10   452   496    18    18   470   514
## # ... with 1,365 more rows

So now that we have that, how can we get all the squares claimed more than once? To do this, we’ll go row-wise on this data.frame. With each row, we’ll start by create vectors which correspond to all the claimed squares. To do that, we’ll create two vectors: row and columns, use expand.grid to create all the possible combination (i.e all the locations on the fabric) for this claim.

For example, imagine a claim that goes from square 1 to 2 and from line 3 to 4, we’ll create a vector 1:2 and 3:4, and when combining that we’ll have the id for all claimed squares: 1×3, 1×4, 2×3 and 2×4. We’ll do this for all the claims, then count the number of occurrence for each newly created id.

res <- map_df(
1:nrow(df),
function(x){
res <- list(
col = (df$left[x] + 1):(df$left[x]+df$wide[x]), row = (df$top[x] + 1):(df$top[x]+df$tall[x])
)
expand.grid(res$col, res$row) %>%
mutate(claim_id = df$id[x]) %>% unite(col = "idx", c("Var1", "Var2"), sep = "x") } ) %>% as.tibble() res ## # A tibble: 509,129 x 2 ## idx claim_id ## <chr> <dbl> ## 1 484x831 1 ## 2 485x831 1 ## 3 486x831 1 ## 4 487x831 1 ## 5 488x831 1 ## 6 489x831 1 ## 7 490x831 1 ## 8 491x831 1 ## 9 492x831 1 ## 10 493x831 1 ## # ... with 509,119 more rows res %>% count(idx) %>% filter(n > 1) %>% count() ## # A tibble: 1 x 1 ## nn ## <int> ## 1 119551 ## Day three, part two What is the ID of the only claim that doesn’t overlap? The second part is a little bit different: we need to identify the claim which doesn’t overlap with any other claim. In other words, the claim that contains only square inches which are claimed once. So how to do that ? We’ll start from our previous table and count the number of square by claim: n_of_claims_by_id <- res %>% count(claim_id) n_of_claims_by_id ## # A tibble: 1,375 x 2 ## claim_id n ## <dbl> <int> ## 1 1 432 ## 2 2 357 ## 3 3 525 ## 4 4 300 ## 5 5 390 ## 6 6 209 ## 7 7 392 ## 8 8 520 ## 9 9 416 ## 10 10 324 ## # ... with 1,365 more rows Now we’ll count the number of time each square is claimed: n_of_claims_by_inch <- res %>% count(idx) n_of_claims_by_inch ## # A tibble: 350,158 x 2 ## idx n ## <chr> <int> ## 1 1000x793 1 ## 2 1000x794 1 ## 3 1000x795 1 ## 4 1000x796 1 ## 5 1000x797 1 ## 6 1000x798 1 ## 7 1000x799 1 ## 8 1000x800 1 ## 9 1000x801 1 ## 10 1000x802 1 ## # ... with 350,148 more rows Let’s add this info to our result object: res <- res %>% left_join(n_of_claims_by_inch, by = "idx") res ## # A tibble: 509,129 x 3 ## idx claim_id n ## <chr> <dbl> <int> ## 1 484x831 1 3 ## 2 485x831 1 3 ## 3 486x831 1 2 ## 4 487x831 1 2 ## 5 488x831 1 2 ## 6 489x831 1 2 ## 7 490x831 1 2 ## 8 491x831 1 2 ## 9 492x831 1 1 ## 10 493x831 1 1 ## # ... with 509,119 more rows Here, we can read that square at 484x831 has been claimed three times (n), and notably by the ID #1. So let’s filter on the square that has only been claimed once: res <- res %>% filter(n == 1) res ## # A tibble: 230,607 x 3 ## idx claim_id n ## <chr> <dbl> <int> ## 1 492x831 1 1 ## 2 493x831 1 1 ## 3 494x831 1 1 ## 4 495x831 1 1 ## 5 496x831 1 1 ## 6 497x831 1 1 ## 7 498x831 1 1 ## 8 499x831 1 1 ## 9 500x831 1 1 ## 10 501x831 1 1 ## # ... with 230,597 more rows Now for the final trick: a claim with no overlap would have, in our result, the same number of occurrence in the claim_id column from res and in the idx from n_of_claims_by_inch res %>% count(claim_id) %>% left_join(n_of_claims_by_id, by = "claim_id") %>% filter(nn == n) ## # A tibble: 1 x 3 ## claim_id nn n ## <dbl> <int> <int> ## 1 1124 299 299 So here it is: the claim_id 1124 is the one we are looking for! ## Day four, part one This one was also fun to do: we have another weird data file, that we need to parse. This data is said to record the date and time, all days between 00 and 00:59 at a manufacturing lab, when the guard is awake of asleep. library(lubridate) ## ## Attaching package: 'lubridate' ## The following object is masked from 'package:base': ## ## date day4 <- read.delim("input4.txt", header = FALSE) %>% as.tibble() day4 ## # A tibble: 1,129 x 1 ## V1 ## <fct> ## 1 [1518-08-21 00:39] wakes up ## 2 [1518-08-11 00:56] wakes up ## 3 [1518-10-10 23:52] Guard #2707 begins shift ## 4 [1518-10-14 00:55] wakes up ## 5 [1518-07-19 00:42] wakes up ## 6 [1518-06-16 00:41] falls asleep ## 7 [1518-03-24 00:57] wakes up ## 8 [1518-05-06 00:39] wakes up ## 9 [1518-08-27 00:00] Guard #3323 begins shift ## 10 [1518-03-22 23:50] Guard #2753 begins shift ## # ... with 1,119 more rows time_minute <- day4 %>% # Create a hour / event column separate(V1, into = c("hour", "event"), sep = "] ") %>% mutate(hour = gsub("\\[", "", hour), hour = ymd_hm(hour)) %>% # Order the hour column arrange(hour) %>% # Create a column with the guard ID id present mutate(guard_id = str_extract(event, "#([0-9]*)"), guard_id = str_replace(guard_id, "#", ""), no_id = is.na(guard_id), record = 1:n() ) %>% # Add the guard ID everywhere fill(guard_id) %>% # Filter to keep only the event rows filter(no_id) %>% select(-no_id) %>% # Spread for event and hour spread(event, hour) %>% # Fill the wakes up column so that there is no NA fill(wakes up, .direction = "up") %>% select(-record) %>% # Remove the line with NA in the fall asleep column drop_na(falls asleep) %>% # Get one minutes less, as the wakes up minute is considered to be an # awaken minute mutate(wakes up = wakes up - minutes(1), # Create a list column with the sequence of minutes the guard is asleep minutes_asleep = map2(falls asleep, wakes up, seq, by = "min"), # Compute how many time the guard is asleep sleepy_time = wakes up - falls asleep) time_minute ## # A tibble: 420 x 5 ## guard_id falls asleep wakes up minutes_asleep ## <chr> <dttm> <dttm> <list> ## 1 1213 1518-02-16 00:08:00 1518-02-16 00:22:00 <dttm [15]> ## 2 1213 1518-02-16 00:44:00 1518-02-16 00:50:00 <dttm [7]> ## 3 1213 1518-02-16 00:56:00 1518-02-16 00:58:00 <dttm [3]> ## 4 1213 1518-02-18 00:13:00 1518-02-18 00:24:00 <dttm [12]> ## 5 1213 1518-03-29 00:02:00 1518-03-29 00:10:00 <dttm [9]> ## 6 1213 1518-03-29 00:16:00 1518-03-29 00:47:00 <dttm [32]> ## 7 1213 1518-04-27 00:03:00 1518-04-27 00:19:00 <dttm [17]> ## 8 1213 1518-04-27 00:24:00 1518-04-27 00:41:00 <dttm [18]> ## 9 1213 1518-04-27 00:47:00 1518-04-27 00:58:00 <dttm [12]> ## 10 1213 1518-05-06 00:02:00 1518-05-06 00:38:00 <dttm [37]> ## # ... with 410 more rows, and 1 more variable: sleepy_time <time> Find the guard that has the most minutes asleep. What minute does that guard spend asleep the most? So, our first task is to know which guard is asleep the most and on which minute of the hour. Let’s first compute who this guard is: which_guard <- time_minute %>% group_by(guard_id) %>% summarise(s = sum(sleepy_time)) %>% top_n(1, s) Now we know who he is, let’s see at which minute of the hour he’s the most asleep: time_minute %>% filter(guard_id == which_guard$guard_id) %>%
unnest(minutes_asleep) %>%
print() %>%
# Extract the minute part
mutate(minutes_asleep = minute(minutes_asleep)) %>%
count(minutes_asleep, sort = TRUE)%>%
top_n(1)

## # A tibble: 502 x 5
##    guard_id falls asleep      wakes up          sleepy_time
##    <chr>    <dttm>              <dttm>              <time>
##  1 2753     1518-02-17 00:13:00 1518-02-17 00:26:00 780 secs
##  2 2753     1518-02-17 00:13:00 1518-02-17 00:26:00 780 secs
##  3 2753     1518-02-17 00:13:00 1518-02-17 00:26:00 780 secs
##  4 2753     1518-02-17 00:13:00 1518-02-17 00:26:00 780 secs
##  5 2753     1518-02-17 00:13:00 1518-02-17 00:26:00 780 secs
##  6 2753     1518-02-17 00:13:00 1518-02-17 00:26:00 780 secs
##  7 2753     1518-02-17 00:13:00 1518-02-17 00:26:00 780 secs
##  8 2753     1518-02-17 00:13:00 1518-02-17 00:26:00 780 secs
##  9 2753     1518-02-17 00:13:00 1518-02-17 00:26:00 780 secs
## 10 2753     1518-02-17 00:13:00 1518-02-17 00:26:00 780 secs
## # ... with 492 more rows, and 1 more variable: minutes_asleep <dttm>

## Selecting by n

## # A tibble: 1 x 2
##   minutes_asleep     n
##            <int> <int>
## 1             28    14

## Day four, part one

The second part was quite straightforward once you’ve got the previous answer. > Of all guards, which guard is most frequently asleep on the same minute?

time_minute %>%
unnest(minutes_asleep) %>%
mutate(minutes_asleep = minute(minutes_asleep)) %>%
count(guard_id, minutes_asleep, sort = TRUE) %>%
top_n(1)

## Selecting by n

## # A tibble: 1 x 3
##   guard_id minutes_asleep     n
##   <chr>             <int> <int>
## 1 1213                 19    16