Solving #AdventOfCode day 3 and 4 with R

[This article was first published on Colin Fay, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Solving the puzzles of Advent of Code with R.

[Disclaimer] Obviously, this post contains a big spoiler about Advent of Code, as it gives solutions for solving day 1 and 2.

Advent of Code

Advent of Code is an Advent calendar of small programming puzzles for a variety of skill sets and skill levels that can be solved in any programming language you like. About Advent of Code

Day three, part one

The first part of the challenge for day 3 starts well: we have to read in R this file:

readLines("input3.txt", n = 6)

## [1] "#1 @ 483,830: 24x18" "#2 @ 370,498: 21x17" "#3 @ 403,823: 25x21"
## [4] "#4 @ 619,976: 20x15" "#5 @ 123,385: 15x26" "#6 @ 484,592: 11x19"

It’s a description of “spots”, claimed by the elves, inside a big square piece of fabric (supposed to be for making Santa’s clothes (remember we’re coding with Christmas theme 😉 ).

It reads:

"#1 @ 483,830: 24x18" 
Id / distance from left / distance from top / width / height 

In other words, the “claim” with ID one is for a spot that starts at 483 inches from the left of the fabric and at 830 inches from the top. This piece is 24 inches wide and 18 inches tall.

How many square inches of fabric are within two or more claims?

Our challenge is to determine how many squares of fabric are claimed more than once. In over words, if we take all the squares from all the elves, how many square inches are claimed by more than one elf?

Let’s start by reading the file:

library(tidyverse)
df <- read.delim("input3.txt", header = FALSE) %>%
  as.tibble() %>%
  print() %>%
  # Clean the column 
  mutate(V1 = str_replace_all(V1, "[#@,:x]", " "),
         V1 = str_replace_all(V1, " {2,}", " "), 
         V1 = str_replace_all(V1, "^ ", "")) %>%
  # Separate the column
  separate(V1, into = c("id", "left", "top", "wide", "tall")) %>% 
  # Make sure everything is numeric
  mutate_all(as.numeric) %>%
  # Create the right and down element
  mutate(
    right = left + wide, 
    down = top + tall 
  ) 

## # A tibble: 1,375 x 1
##    V1                  
##    <fct>               
##  1 #1 @ 483,830: 24x18 
##  2 #2 @ 370,498: 21x17 
##  3 #3 @ 403,823: 25x21 
##  4 #4 @ 619,976: 20x15 
##  5 #5 @ 123,385: 15x26 
##  6 #6 @ 484,592: 11x19 
##  7 #7 @ 394,960: 28x14 
##  8 #8 @ 730,592: 26x20 
##  9 #9 @ 975,963: 16x26 
## 10 #10 @ 452,496: 18x18
## # ... with 1,365 more rows

cat("Cleaned version:\n")

## Cleaned version:

df

## # A tibble: 1,375 x 7
##       id  left   top  wide  tall right  down
##    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
##  1     1   483   830    24    18   507   848
##  2     2   370   498    21    17   391   515
##  3     3   403   823    25    21   428   844
##  4     4   619   976    20    15   639   991
##  5     5   123   385    15    26   138   411
##  6     6   484   592    11    19   495   611
##  7     7   394   960    28    14   422   974
##  8     8   730   592    26    20   756   612
##  9     9   975   963    16    26   991   989
## 10    10   452   496    18    18   470   514
## # ... with 1,365 more rows

So now that we have that, how can we get all the squares claimed more than once? To do this, we’ll go row-wise on this data.frame. With each row, we’ll start by create vectors which correspond to all the claimed squares. To do that, we’ll create two vectors: row and columns, use expand.grid to create all the possible combination (i.e all the locations on the fabric) for this claim.

For example, imagine a claim that goes from square 1 to 2 and from line 3 to 4, we’ll create a vector 1:2 and 3:4, and when combining that we’ll have the id for all claimed squares: 1×3, 1×4, 2×3 and 2×4. We’ll do this for all the claims, then count the number of occurrence for each newly created id.

res <- map_df(
  1:nrow(df), 
  function(x){
    res <- list(
     col = (df$left[x] + 1):(df$left[x]+df$wide[x]),
     row = (df$top[x] + 1):(df$top[x]+df$tall[x])
    )
    expand.grid(res$col, res$row) %>%
      mutate(claim_id = df$id[x]) %>%
      unite(col = "idx", c("Var1", "Var2"), sep = "x")
  }
) %>% as.tibble()
res

## # A tibble: 509,129 x 2
##    idx     claim_id
##    <chr>      <dbl>
##  1 484x831        1
##  2 485x831        1
##  3 486x831        1
##  4 487x831        1
##  5 488x831        1
##  6 489x831        1
##  7 490x831        1
##  8 491x831        1
##  9 492x831        1
## 10 493x831        1
## # ... with 509,119 more rows

res %>% 
  count(idx) %>%
  filter(n > 1) %>%
  count()  

## # A tibble: 1 x 1
##       nn
##    <int>
## 1 119551

Day three, part two

What is the ID of the only claim that doesn’t overlap?

The second part is a little bit different: we need to identify the claim which doesn’t overlap with any other claim. In other words, the claim that contains only square inches which are claimed once.

So how to do that ?

We’ll start from our previous table and count the number of square by claim:

n_of_claims_by_id <- res %>% 
  count(claim_id)
n_of_claims_by_id

## # A tibble: 1,375 x 2
##    claim_id     n
##       <dbl> <int>
##  1        1   432
##  2        2   357
##  3        3   525
##  4        4   300
##  5        5   390
##  6        6   209
##  7        7   392
##  8        8   520
##  9        9   416
## 10       10   324
## # ... with 1,365 more rows

Now we’ll count the number of time each square is claimed:

n_of_claims_by_inch <- res %>%
  count(idx)
n_of_claims_by_inch

## # A tibble: 350,158 x 2
##    idx          n
##    <chr>    <int>
##  1 1000x793     1
##  2 1000x794     1
##  3 1000x795     1
##  4 1000x796     1
##  5 1000x797     1
##  6 1000x798     1
##  7 1000x799     1
##  8 1000x800     1
##  9 1000x801     1
## 10 1000x802     1
## # ... with 350,148 more rows

Let’s add this info to our result object:

res <- res %>%
  left_join(n_of_claims_by_inch, by = "idx") 
res

## # A tibble: 509,129 x 3
##    idx     claim_id     n
##    <chr>      <dbl> <int>
##  1 484x831        1     3
##  2 485x831        1     3
##  3 486x831        1     2
##  4 487x831        1     2
##  5 488x831        1     2
##  6 489x831        1     2
##  7 490x831        1     2
##  8 491x831        1     2
##  9 492x831        1     1
## 10 493x831        1     1
## # ... with 509,119 more rows

Here, we can read that square at 484x831 has been claimed three times (n), and notably by the ID #1. So let’s filter on the square that has only been claimed once:

res <- res %>%
  filter(n == 1)
res

## # A tibble: 230,607 x 3
##    idx     claim_id     n
##    <chr>      <dbl> <int>
##  1 492x831        1     1
##  2 493x831        1     1
##  3 494x831        1     1
##  4 495x831        1     1
##  5 496x831        1     1
##  6 497x831        1     1
##  7 498x831        1     1
##  8 499x831        1     1
##  9 500x831        1     1
## 10 501x831        1     1
## # ... with 230,597 more rows

Now for the final trick: a claim with no overlap would have, in our result, the same number of occurrence in the claim_id column from res and in the idx from n_of_claims_by_inch

res %>%
  count(claim_id) %>%
  left_join(n_of_claims_by_id, by = "claim_id") %>%
  filter(nn == n)

## # A tibble: 1 x 3
##   claim_id    nn     n
##      <dbl> <int> <int>
## 1     1124   299   299

So here it is: the claim_id 1124 is the one we are looking for!

Day four, part one

This one was also fun to do: we have another weird data file, that we need to parse. This data is said to record the date and time, all days between 00 and 00:59 at a manufacturing lab, when the guard is awake of asleep.

library(lubridate)

## 
## Attaching package: 'lubridate'

## The following object is masked from 'package:base':
## 
##     date

day4 <- read.delim("input4.txt", header = FALSE) %>% as.tibble()
day4

## # A tibble: 1,129 x 1
##    V1                                         
##    <fct>                                      
##  1 [1518-08-21 00:39] wakes up                
##  2 [1518-08-11 00:56] wakes up                
##  3 [1518-10-10 23:52] Guard #2707 begins shift
##  4 [1518-10-14 00:55] wakes up                
##  5 [1518-07-19 00:42] wakes up                
##  6 [1518-06-16 00:41] falls asleep            
##  7 [1518-03-24 00:57] wakes up                
##  8 [1518-05-06 00:39] wakes up                
##  9 [1518-08-27 00:00] Guard #3323 begins shift
## 10 [1518-03-22 23:50] Guard #2753 begins shift
## # ... with 1,119 more rows

time_minute <- day4 %>%
  # Create a hour / event column
  separate(V1, into = c("hour", "event"), sep = "] ") %>% 
  mutate(hour = gsub("\\[", "", hour), 
         hour = ymd_hm(hour)) %>%
  # Order the hour column
  arrange(hour) %>%
  # Create a column with the guard ID id present
  mutate(guard_id = str_extract(event, "#([0-9]*)"), 
         guard_id = str_replace(guard_id, "#", ""), 
         no_id = is.na(guard_id), 
         record = 1:n()
         ) %>%
  # Add the guard ID everywhere
  fill(guard_id) %>%
  # Filter to keep only the event rows
  filter(no_id) %>% 
  select(-no_id) %>%
  # Spread for event and hour
  spread(event, hour) %>%
  # Fill the wakes up column so that there is no NA
  fill(`wakes up`, .direction = "up") %>%
  select(-record) %>%
  # Remove the line with NA in the fall asleep column
  drop_na(`falls asleep`) %>%
  # Get one minutes less, as the wakes up minute is considered to be an
  # awaken minute
  mutate(`wakes up` = `wakes up` - minutes(1),
  # Create a list column with the sequence of minutes the guard is asleep
         minutes_asleep = map2(`falls asleep`, `wakes up`, 
                               seq, by = "min"),
  # Compute how many time the guard is asleep
         sleepy_time = `wakes up` - `falls asleep`) 
time_minute

## # A tibble: 420 x 5
##    guard_id `falls asleep`      `wakes up`          minutes_asleep
##    <chr>    <dttm>              <dttm>              <list>        
##  1 1213     1518-02-16 00:08:00 1518-02-16 00:22:00 <dttm [15]>   
##  2 1213     1518-02-16 00:44:00 1518-02-16 00:50:00 <dttm [7]>    
##  3 1213     1518-02-16 00:56:00 1518-02-16 00:58:00 <dttm [3]>    
##  4 1213     1518-02-18 00:13:00 1518-02-18 00:24:00 <dttm [12]>   
##  5 1213     1518-03-29 00:02:00 1518-03-29 00:10:00 <dttm [9]>    
##  6 1213     1518-03-29 00:16:00 1518-03-29 00:47:00 <dttm [32]>   
##  7 1213     1518-04-27 00:03:00 1518-04-27 00:19:00 <dttm [17]>   
##  8 1213     1518-04-27 00:24:00 1518-04-27 00:41:00 <dttm [18]>   
##  9 1213     1518-04-27 00:47:00 1518-04-27 00:58:00 <dttm [12]>   
## 10 1213     1518-05-06 00:02:00 1518-05-06 00:38:00 <dttm [37]>   
## # ... with 410 more rows, and 1 more variable: sleepy_time <time>

Find the guard that has the most minutes asleep. What minute does that guard spend asleep the most?

So, our first task is to know which guard is asleep the most and on which minute of the hour. Let’s first compute who this guard is:

which_guard <- time_minute %>%
  group_by(guard_id) %>%
  summarise(s = sum(sleepy_time)) %>%
  top_n(1, s) 

Now we know who he is, let’s see at which minute of the hour he’s the most asleep:

time_minute %>%
  filter(guard_id == which_guard$guard_id) %>%
  unnest(minutes_asleep) %>%
  print() %>%
  # Extract the minute part
  mutate(minutes_asleep = minute(minutes_asleep)) %>%
  count(minutes_asleep, sort = TRUE)%>%
  top_n(1)

## # A tibble: 502 x 5
##    guard_id `falls asleep`      `wakes up`          sleepy_time
##    <chr>    <dttm>              <dttm>              <time>     
##  1 2753     1518-02-17 00:13:00 1518-02-17 00:26:00 780 secs   
##  2 2753     1518-02-17 00:13:00 1518-02-17 00:26:00 780 secs   
##  3 2753     1518-02-17 00:13:00 1518-02-17 00:26:00 780 secs   
##  4 2753     1518-02-17 00:13:00 1518-02-17 00:26:00 780 secs   
##  5 2753     1518-02-17 00:13:00 1518-02-17 00:26:00 780 secs   
##  6 2753     1518-02-17 00:13:00 1518-02-17 00:26:00 780 secs   
##  7 2753     1518-02-17 00:13:00 1518-02-17 00:26:00 780 secs   
##  8 2753     1518-02-17 00:13:00 1518-02-17 00:26:00 780 secs   
##  9 2753     1518-02-17 00:13:00 1518-02-17 00:26:00 780 secs   
## 10 2753     1518-02-17 00:13:00 1518-02-17 00:26:00 780 secs   
## # ... with 492 more rows, and 1 more variable: minutes_asleep <dttm>

## Selecting by n

## # A tibble: 1 x 2
##   minutes_asleep     n
##            <int> <int>
## 1             28    14

Day four, part one

The second part was quite straightforward once you’ve got the previous answer. > Of all guards, which guard is most frequently asleep on the same minute?

time_minute %>%
  unnest(minutes_asleep) %>%
  mutate(minutes_asleep = minute(minutes_asleep)) %>%
  count(guard_id, minutes_asleep, sort = TRUE) %>% 
  top_n(1)

## Selecting by n

## # A tibble: 1 x 3
##   guard_id minutes_asleep     n
##   <chr>             <int> <int>
## 1 1213                 19    16

To leave a comment for the author, please follow the link and comment on their blog: Colin Fay.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)