Advent of Code 2020

[This article was first published on R on Tea & Stats, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Advent of Code is a series of programming puzzles you can tackle to hone your coding skills each day in the run-up to Christmas.

This year I am attempting it using R, which can make some challenges easier or harder depending on whether they are more ‘computer sciencey’ or more ‘data sciencey’. Generally it makes parsing datasets easier but low-level string manipulation more fiddly.

Here are my solutions so far. Where possible, I’ve tried to strike a balance between efficiency and readability, and to try avoid using the packages I might usually use (e.g. dplyr) if I think it makes the puzzle too easy.

The input data are different for each participant, so your numerical results may differ from mine.

  1. Report repair
  2. Password philosophy
  3. Toboggan trajectory
  4. Passport processing
  5. Binary boarding
  6. Custom customs
  7. Handy haversacks
  8. Handheld halting
  9. Encoding error
  10. Adapter array
  11. Seating system
  12. Rain risk
  13. Shuttle search
  14. Docking data
  15. Rambunctious recitation

Day 1 – Report repair

Two numbers

Find the two entries that sum to 2020, then multiply those two numbers together.

This can be a one-liner:

input <- as.integer(readLines('input01.txt'))
prod(input[(2020 - input) %in% input])

[1] 468051

Three numbers

Find the three entries that sum to 2020, then multiply them together.

It might be tempting to go for a naïve solution like this:

prod(combn(input, 3)[, combn(input, 3, sum) == 2020])

[1] 272611658

It gives the right answer but involves a fair amount of unnecessary computation. It takes more than a second to run. If we assume all the inputs are non-negative, we can take advantage of this to reduce the number of operations.

. <- expand.grid(input, input[(2020 - input) > min(input)])
. <- transform(., Var3 = 2020 - Var1 - Var2)
. <- subset(., Var3 > min(input))
prod(.[which.max(.$Var3 %in% input), ])

[1] 272611658

This is approximately 2000 times faster than the one-liner, and works by successively discarding values that could only add up to more than 2020. The . notation is just so I can write this without using dplyr/magrittr.

Day 2 - Password philosophy

Number of letters

How many passwords are valid according to the policies?

1-3 a: abcde
1-3 b: cdefg
2-9 c: ccccccccc

First read in the data. I like data frames and so should you.

input <- readLines('input02.txt')
passwords <- do.call(rbind, strsplit(input, '[- ]|\\: '))
passwords <- setNames(as.data.frame(passwords),
                      c('min', 'max', 'letter', 'password'))
passwords <- transform(passwords,
                       min = as.integer(min),
                       max = as.integer(max))
head(passwords)

  min max letter          password
1  14  15      v  vdvvvvvsvvvvvfpv
2   3  11      k  kkqkkfkkvkgfknkx
3   6  10      j        jjjjjjjjjj
4   5  10      s nskdmzwrmpmhsrzts
5  13  15      v   vvvvvvkvvvvjzvv
6  11  13      h    hhhhhbhhhhdhhh

String operations are a bit of a pain in base R so it’s easier just to use a package, like stringi or stringr for this.

with(passwords, {
     n <- stringr::str_count(password, letter)
     sum(n >= min & n <= max)
})

[1] 625

You could also split each password with strsplit and count the letters with an sapply-type loop.

Position of letters

Now the two digits describe two indices in the password, exactly one of which must match the given letter.

with(passwords,
     sum(xor(substr(password, min, min) == letter,
             substr(password, max, max) == letter))
)

[1] 391

Initially I got caught out here, by misreading the question as ‘at least one’ and then wondering why an inclusive or (|) was returning the incorrect answer.

Day 3 - Toboggan trajectory

The input looks a bit like this:

..##.........##.........##.........##.........##.........##.......  --->
#...#...#..#...#...#..#...#...#..#...#...#..#...#...#..#...#...#..
.#....#..#..#....#..#..#....#..#..#....#..#..#....#..#..#....#..#.
..#.#...#.#..#.#...#.#..#.#...#.#..#.#...#.#..#.#...#.#..#.#...#.#
.#...##..#..#...##..#..#...##..#..#...##..#..#...##..#..#...##..#.
..#.##.......#.##.......#.##.......#.##.......#.##.......#.##.....  --->
.#.#.#....#.#.#.#....#.#.#.#....#.#.#.#....#.#.#.#....#.#.#.#....#
.#........#.#........#.#........#.#........#.#........#.#........#
#.##...#...#.##...#...#.##...#...#.##...#...#.##...#...#.##...#...
#...##....##...##....##...##....##...##....##...##....##...##....#
.#..#...#.#.#..#...#.#.#..#...#.#.#..#...#.#.#..#...#.#.#..#...#.#  --->

Encountering trees

Starting at the top-left corner of your map and following a slope of right 3 and down 1, how many trees would you encounter?

input <- readLines('input03.txt')

A complicated-sounding problem but the solution is mainly mathematical.

positions <- (3 * (seq_along(input) - 1)) %% nchar(input) + 1
sum(substr(input, positions, positions) == '#')

[1] 268

The sequence of positions goes 1, 4, 7, …, and when it reaches the edge of the map, loops back round to the beginning. Using the modulo operator we can use the sequence modulo the width of the input map, then add one because R indexes from one rather than from zero.

Different slopes

Simply wrap the above into a function.

trees <- function(right, down = 1) {
  vertical <- seq(0, length(input) - 1, by = down) + 1
  horizontal <- (right * (seq_along(input) - 1)) %% nchar(input) + 1
  horizontal <- head(horizontal, length(vertical))
  as.double(
    sum(substr(input[vertical], horizontal, horizontal) == '#')
  )
}
trees(1) * trees(3) * trees(5) * trees(7) * trees(1, 2)

[1] 3093068400

The as.double bit is necessary only because multiplying large integer outputs together can cause an overflow when the product is larger than 109.

Day 4 - Passport processing

The example input is in this ragged format, where keys and values are separated by colons and records are separated by double newlines. The first step is to parse this unusual data format.

ecl:gry pid:860033327 eyr:2020 hcl:#fffffd
byr:1937 iyr:2017 cid:147 hgt:183cm

iyr:2013 ecl:amb cid:350 eyr:2023 pid:028048884
hcl:#cfa07d byr:1929

hcl:#ae17e1 iyr:2013
eyr:2024
ecl:brn pid:760753108 byr:1931
hgt:179cm

hcl:#cfa07d eyr:2025 pid:166559648
iyr:2011 ecl:brn hgt:59in

input <- strsplit(readLines('input04.txt'), ' ')
ids = cumsum(!lengths(input))
pairs <- lapply(strsplit(unlist(input), ':'), setNames, c('key', 'value'))
passports <- data.frame(id = rep(ids, lengths(input)),
                        do.call(rbind, pairs))

Missing fields

Now the data are in a standard format, this is a simple split-apply-combine operation. I am using the base aggregate but this could be done equally well using dplyr or data.table.

required <- c('byr', 'iyr', 'eyr', 'hgt', 'hcl', 'ecl', 'pid')
valid <- aggregate(key ~ id, passports,
                   function(x) !length(setdiff(required, x)))
head(valid, 10)

   id   key
1   0  TRUE
2   1  TRUE
3   2  TRUE
4   3  TRUE
5   4  TRUE
6   5  TRUE
7   6  TRUE
8   7 FALSE
9   8 FALSE
10  9  TRUE

Then the answer is simply

sum(valid$key)

[1] 190

Field validation

Thanks to the way we imported the data, this is quite straightforward. The rules are:

  • byr (Birth Year) - four digits; at least 1920 and at most 2002.
  • iyr (Issue Year) - four digits; at least 2010 and at most 2020.
  • eyr (Expiration Year) - four digits; at least 2020 and at most 2030.
  • hgt (Height) - a number followed by either cm or in:
    • If cm, the number must be at least 150 and at most 193.
    • If in, the number must be at least 59 and at most 76.
  • hcl (Hair Color) - a # followed by exactly six characters 0-9 or a-f.
  • ecl (Eye Color) - exactly one of: amb blu brn gry grn hzl oth.
  • pid (Passport ID) - a nine-digit number, including leading zeroes.
  • cid (Country ID) - ignored, missing or not.

The data are all different types (integer, double and categorical) so the first step will be to spread the table to a wider format, with one row per passport, and one column for each field.

Here is a dplyr + tidyr solution.

library(dplyr)
library(tidyr)
passports_wide <- passports %>%
  pivot_wider(names_from = key, values_from = value) %>%
  mutate(byr = as.integer(byr),
         iyr = as.integer(iyr),
         eyr = as.integer(eyr),
         hgt_value = as.numeric(gsub('cm|in$', '', hgt)),
         hgt_unit = gsub('\\d*', '', hgt))
head(passports_wide)

# A tibble: 6 x 11
     id   iyr cid   pid        eyr hcl     ecl      byr hgt   hgt_value hgt_unit
                         
1     0  1928 150   4761132~  2039 a5ac0f  #25f8~  2027 190         190 ""      
2     1  2013 169   9200769~  2026 #fffffd hzl     1929 168cm       168 "cm"    
3     2  2011   3284128~  2023 #6b5442 brn     1948 156cm       156 "cm"    
4     3  2019 279   6749079~  2020 #602927 amb     1950 189cm       189 "cm"    
5     4  2015   4736300~  2022 #341e13 hzl     1976 178cm       178 "cm"    
6     5  2020   6281139~  2023 #866857 blu     1984 163cm       163 "cm"    

From here, we can filter out the invalid entries, using filter or subset.

passports_wide %>%
  filter(byr >= 1920, byr <= 2002,
         iyr >= 2010, iyr <= 2020,
         eyr >= 2020, eyr <= 2030,
         hgt_value >= 150 & hgt_value <= 193 & hgt_unit == 'cm' |
           hgt_value >= 59 & hgt_value <= 76 & hgt_unit == 'in',
         grepl('^#[0-9a-f]{6}$', hcl),
         ecl %in% c('amb', 'blu', 'brn', 'gry', 'grn', 'hzl', 'oth'),
         grepl('^\\d{9}$', pid)) -> valid_passports
nrow(valid_passports)

[1] 121

You could also use a filtering join, though since most of the fields are ranges of integer values, you would want to use a data.table non-equi-join rather than a simple semi_join.

Day 5 - Binary boarding

Highest seat ID

This task is easy, as soon as you recognise that it is just converting numbers from binary to decimal, where F and L denote ones and B and R are zeros. The distinction between rows and columns is a red herring, because you can parse the whole sequence at once.

input <- readLines('input05.txt')
binary <- lapply(strsplit(input, ''), grepl, pattern = '[BR]')
seat_ids <- sapply(binary, function(x) sum(x * 2^(rev(seq_along(x)) - 1)))
max(seat_ids)

[1] 874

Finding an empty seat

Get the missing value, which isn’t the minimum or the maximum in the list.

setdiff(seq(min(seat_ids), max(seat_ids)),
        seat_ids)

[1] 594

Day 6 - Custom customs

Questions with any ‘yes’

Count the number of unique letters in each group, where a ‘group’ is series of strings separated from others by a blank line. This is a union set operation.

input <- readLines('input06.txt')
group <- cumsum(!nchar(input))

library(dplyr)
responses <- data.frame(group = group[nchar(input) > 0],
                        questions = input[nchar(input) > 0])
union <- aggregate(questions ~ group, responses,
                    function(x) length(unique(unlist(strsplit(x, '')))))
sum(union$questions)

[1] 6551

Questions with all ‘yes’

Similar, but now an intersection set operation.

intersection <- aggregate(questions ~ group, responses,
                          function(x) length(Reduce(intersect, strsplit(x, ''))))
sum(intersection$questions)

[1] 3358

The solution to the first part could have used Reduce(union, ...), which would achieve the same result as unique(unlist(...)).

Both of these could be made a bit more readable using dplyr or data.table instead. In particular, the base function aggregate doesn’t like list-columns as inputs, so the strsplit can’t be done before the aggregation. This is not a problem with dplyr::summarise or data.table:

library(dplyr)
responses %>%
  mutate(questions = strsplit(questions, '')) %>%
  group_by(group) %>%
  summarise(count = Reduce(intersect, questions) %>% length) %>%
  pull(count) %>% sum

[1] 3358

library(data.table)
setDT(responses)[, questions := strsplit(questions, '')]
responses[, .(count = length(Reduce(intersect, questions))),
          by = group][, sum(count)]

[1] 3358

Day 7 - Handy haversacks

Number of bag colours

Given an input list of rules, how many different colours of bags may contain at least one shiny gold bag?

The first step will be to parse the natural language input, which looks like this:

input <- readLines('input07.txt')
head(input)

[1] "mirrored silver bags contain 4 wavy gray bags."                                                                
[2] "clear tan bags contain 5 bright purple bags, 1 pale black bag, 5 muted lime bags."                             
[3] "dim crimson bags contain 5 vibrant salmon bags, 2 clear cyan bags, 2 striped lime bags, 5 vibrant violet bags."
[4] "mirrored beige bags contain 4 pale gold bags, 1 pale aqua bag."                                                
[5] "pale maroon bags contain 2 dotted orange bags."                                                                
[6] "dim tan bags contain no other bags."                                                                           

For this first exercise, the numbers of bags within each one are irrelevant (but we will need them later for part 2). For now, we just want to reduce it to which colours can contain which others.

To start, I tidied up the data into a flat data frame. This isn’t strictly necessary—a named list would work too—but it’s easy to keep track of everything in a flat data structure.

library(tidyr)
rules <- strsplit(input, ' contain ') %>%
  lapply(gsub, pattern = '\\.| bags?', replacement = '') %>%
  do.call(rbind, .) %>%
  as.data.frame %>%
  setNames(c('container', 'content')) %>%
  transform(content = strsplit(content, ', ')) %>%
  unnest_longer(content) %>%
  extract(content, c('number', 'content'), '(\\d+) (.+)') %>%
  transform(number = as.numeric(number)) %>%
  transform(number = replace(number, is.na(number), 0))
head(rules)

        container number        content
1 mirrored silver      4      wavy gray
2       clear tan      5  bright purple
3       clear tan      1     pale black
4       clear tan      5     muted lime
5     dim crimson      5 vibrant salmon
6     dim crimson      2     clear cyan

The algorithm is a queue, which works as follows.

  1. Look up which bags can directly contain shiny gold
  2. Look up which bags can directly contain the results of 1.
  3. Repeat until no more bags can contain the result.

Here’s the loop:

bag <- 'shiny gold'
containers <- NULL

repeat {
  contained_in <- subset(rules, content %in% bag)
  if ( !nrow(contained_in) )
    break
  bag <- setdiff(contained_in$container, containers)
  containers <- union(containers, bag)
}

length(containers)

[1] 259

Number of individual bags

We ignored the numbers of bags in part 1, but we need them, now. How many individual bags fit inside a single shiny gold bag?

To understand recursion, you must first understand recursion. My function, count_bag, calls itself. In more loop-friendly languages you might use a queue for this second part, but I can’t really think of a concise way to do it using R.

count_bag <- function(colour, factor = 1) {
  stopifnot(length(colour) == 1)
  
  rule <- subset(rules, container == colour)
  
  if (nrow(rule) == 1 & rule$number[1] == 0) {
    out <- 0
  } else {
    # need to work row-wise or you'll come unstuck:
    out <- mapply(count_bag, rule$content, rule$number)
  }
  
  factor * (1 + sum(out))
}

We remove 1 at the end so as not to include the shiny gold bag itself:

count_bag('shiny gold', 1) - 1

[1] 45018

Day 8 - Handheld halting

Infinite loop

Just a simple loop that keeps track of all the places it has been so far, and terminates the moment it visits a location for the second time.

input <- read.table('input08.txt', col.names = c('instr', 'value'))

acc <- input$visited <- 0
i <- 1

repeat {
  input$visited[i] <- input$visited[i] + 1
  
  if ( any(input$visited > 1) )
    break
  
  acc <- acc + input$value[i] * (input$instr[i] == 'acc')
  i <- i + (input$instr[i] == 'jmp') * (input$value[i]  - 1) + 1
}

acc

[1] 1600

Originally I wrote this with nested if statements, then changed it to binary multiplication, for fewer lines of code, at the expense of readability.

This puzzle is set up to catch you out. From seeing nop +0 in the example data you might be tempted to assume that adding the value on nop instructions won’t affect the accumulator. But the test input data have some non-zero nop values thrown in, that you will surely encounter:

head(subset(input, instr == 'nop' & value != 0 & visited > 0))

    instr value visited
2     nop   631       1
11    nop    83       2
71    nop   168       1
73    nop   151       1
96    nop   -25       1
123   nop    -9       1

Thus you must only jump or add to the accumulator on instructions that are explicitly jmp or acc, respectively.

Corrupted code

From part 1, we already have an algorithm for finding the first instruction that will lead into an infinite loop. Instead of terminating at this point, we can assume that last instruction was corrupted, swap it for the other type, then continue until we find another such corruption, all the way until the program is able to terminate on its own.

That was the idea, anyway. Then I got fed up and decided to brute force it, instead. Maybe there is a subtler way, but this appears to work quickly enough. One thing worth noting is that you only need to look at those indices already visited in part 1.

nops_and_jmps <- which(input$instr != 'acc' & input$visited)

brute_force <- function() {
  for (nj in nops_and_jmps) {
    modified <- input
    modified$instr[nj] <- setdiff(c('nop', 'jmp'), input$instr[nj])
    
    acc <- modified$visited <- 0
    i <- 1
    repeat {
      
      if (i == nrow(input) + 1)
        return(acc)
      
      modified$visited[i] <- modified$visited[i] + 1
      
      if ( any(modified$visited > 1) )
        break
      
      acc <- acc + modified$value[i] * (modified$instr[i] == 'acc')
      i <- i + (modified$instr[i] == 'jmp') * (modified$value[i]  - 1) + 1
    }
  }
}

brute_force()

[1] 1543

Day 9 - Encoding error

Adding pairs

Here the question is how to calculate the sums of pairs of values in a sliding window, ideally without redundantly computing the same sums more than once.

find_error <- function(series, N = 25) {
  preamble <- head(series, N)
  t(combn(preamble, 2)) ->.; addmargins(., 2) ->.; as.data.frame(.) -> pairs
  for (x in tail(series, -N)) {
    if (!x %in% pairs$Sum)
      return(x)

    pairs <- subset(pairs, V1 != preamble[1] & V2 != preamble[1])
    pairs <- rbind(pairs, data.frame(V1 = x,
                                     V2 = preamble[-1],
                                     Sum = x + preamble[-1]))
    preamble <- c(preamble[-1], x)
  }
}

John Mount recently pointed out that there is already a ‘pipe’ of sorts in base R, which you can construct using an operator of the form ->.;. I use it on the second line of this function just because converting combn output into a long data frame format is a bit verbose.

To check our working, run on the example dataset:

example <- c(35, 20, 15, 25, 47, 40, 62, 55, 65, 95, 102, 117, 150, 182, 127,
             219, 299, 277, 309, 576)
find_error(example, 5)

[1] 127

And now with the real input data. As in earlier exercises, we need floating point numbers, rather than integers, because the large numbers in the real input can cause an integer overflow.

input <- as.double(readLines('input09.txt'))
(invalid <- find_error(input))

[1] 542529149

Contiguous set

To find the longest contiguous set of numbers that add up to the value above, we first recognise that the values are non-negative, so we can immediately exclude any elements that are after our target invalid element.

My procedure will then go as follows. We first initialize an empty set. Then, iterating backwards through the series:

  1. Add up all the values in the current set.
  2. If the sum is greater than the target, delete the last element.
  3. If the sum is equal to the target, and the current set larger than our current best set (initially empty), save this as our best so far.
  4. Prepend the current set with the next element in the sequence.
  5. Repeat 1–4 until you reach the beginning of the series.

In R code form:

contiguous_set <- function(series, target) {
  series <- head(series, which.max(series == target) - 1)
  best_set <- set <- c()
  for (n in rev(series)) {
    if (sum(set) == target & length(set) >= length(best_set))
      best_set <- set
    
    if (sum(set) > target)
      set <- head(set, -1)
    
    set <- c(n, set)
  }
  
  sum(range(best_set))
}

On our example dataset we get:

contiguous_set(example, 127)

[1] 62

And on the test dataset, using the value stored from part 1:

contiguous_set(input, invalid)

[1] 75678618

Day 10 - Adapter array

Lagged differences

This is pretty trivial. Read in the data, append a zero, sort the numbers, compute the lagged differences (appending a 3), tabulate them and multiply the result.

input <- as.integer(readLines('input10.txt'))
jolts <- c(0, sort(input), max(input) + 3)
prod(table(diff(jolts)))

[1] 2170

I could have equally appended the 3 in the second line as max(input) + 3.

Counting combinations

Again we will be working on the lagged differences. Let’s look at a few values from this sequence.

head(diff(jolts), 20)

 [1] 1 1 1 3 3 1 1 1 3 1 1 1 1 3 3 1 1 1 1 3

Which adapters can we remove?

We are interested in the lengths of the sub-sequences of 1s in this series. The R function rle will give the run-length encoding, i.e. the lengths of subsequences of consecutive equal values in our vector.

For a length-n subsequence of differences equal to 1:

  • if n = 1 the adapter can’t be removed because the gap would then be 4:
    • ${0 \choose 0} = 1$
  • if n = 2 then you can only remove the first adapter (or not):
    • ${1 \choose 0} + {1 \choose 1} = 2$
  • if n = 3 you can keep them all, remove 1 or both of the first 2:
    • ${2 \choose 0} + {2 \choose 1} + {2 \choose 2} = 4$
  • if n = 4 you can keep them all, or remove up to 2 of the first 3:
    • ${3 \choose 0} + {3 \choose 1} + {3 \choose 2} = 7$
  • and so on (though actually there aren’t any sequences longer than 4)

Then multiply all these numbers of combinations together for every subsequence.

sequences <- as.data.frame(unclass(rle(diff(jolts))))
sequences <- subset(sequences, values == 1)
count_combos <- function(n) sum( choose(n-1, 0:2) )
sequences <- transform(sequences, combos = sapply(lengths, count_combos))
prod(sequences$combos)

[1] 2.480359e+13

We probably don’t want scientific notation, so reformat the result:

format(prod(sequences$combos), scientific = FALSE)

[1] "24803586664192"

Day 11 - Seating system

Convoluted solution

This puzzle is effectively applying a convolution matrix (the set of rules) to a 2-dimensional image (the seating plan).

We can import the data as a logical (binary) matrix where zero or FALSE means a seat is empty, and one or TRUE means it is occupied. Floor space is set to NA.

input <- do.call(rbind, strsplit(readLines('input11.txt'), ''))
input <- input != 'L'
input[input > 0] <- NA

The rules are:

  • If a seat is empty (L) and there are no occupied seats adjacent to it, the seat becomes occupied.
  • If a seat is occupied (#) and four or more seats adjacent to it are also occupied, the seat becomes empty.
  • Otherwise, the seat’s state does not change.

A convolution kernel matrix can therefore be of the form:

$$\begin{bmatrix}-1 & -1 & -1 \\-1 & 3 & -1 \\-1 & -1 & -1\end{bmatrix}$$

Which is followed by the filter:

  • If the result is zero or more, then occupy the seat (set equal to TRUE or 1)
  • Otherwise, empty the seat (set equal to FALSE or 0)
  • If a cell is meant to be floor space, reset to zero (because OpenImageR is not currently written to handle NAs).
kernel <- matrix(c(rep(-1, 4), 3, rep(-1, 4)), 3, 3)

convoluted_seating <- function(input, kernel) {
  seats <- replace(input, is.na(input), 0)
  
  for (i in 1:100) {
    convolved <- OpenImageR::convolution(seats, kernel, mode = 'same')
    new_seats <- replace(convolved >= 0, is.na(input), 0)
    
    if ( all(seats == new_seats) )
      return(sum(new_seats))
  
    seats <- new_seats
  }
  
  stop('Failed to converge after 100 iterations')
}

convoluted_seating(input, kernel)

[1] 2194

This took 95 iterations.

Line of sight

In the first part, floor space was just treated like a seat that nobody sits in. Now, we have to change our convolution matrix for each pixel such that, if there is no seat in one direction, we cast our gaze further and borrow the state of a more distant seat.

Unfortunately most image analysis packages only accept a constant matrix as the kernel argument, rather than a function, so we shall have to roll our own.

Firstly, we run an algorithm to determine which seats are visible. This only needs to be run once.

For each seat:

  1. Set radius equal to 1.
  2. Look in each of the eight directions for a seat. Is a seat visible?
  3. For any direction where this is not true, increase radius by 1.
  4. Repeat 2–3 until a visible seat is recorded for every direction.

Today I discovered that which() has an extra argument arr.ind that, if TRUE, returns matrix indices. Handy for quickly converting a matrix into a long (possibly sparse) representation.

seat_ids <- which(!is.na(input), arr.ind = TRUE)
dirs <- subset(expand.grid(down = -1:1, right = -1:1), down | right)

radial_search <- function(seat, directions, radius = 1) {
  if (!nrow(directions))
    return(NULL)
  
  i <- seat[1] + radius * directions[['down']]
  j <- seat[2] + radius * directions[['right']]
  
  in_bounds <- i > 0 & i <= nrow(input) & j > 0 & j <= ncol(input)
  ij <- cbind(i, j)[in_bounds, , drop = FALSE]
  
  seat_exists <- !is.na( input[ij] )
  remaining_dirs <- directions[in_bounds, ][!seat_exists, ]
  visible <- unname(ij[seat_exists, , drop = FALSE])
  
  rbind(visible, radial_search(seat, remaining_dirs, radius + 1))
}

line_of_sight <- apply(seat_ids, 1, radial_search, directions = dirs)

Next we run the seat changing algorithm itself. For each seat:

  1. Add up the number of occupied seats visible from this one.
  2. If sum is zero and seat is unoccupied, occupy the seat.
  3. Else if sum is ≥ 5 and seat is occupied, empty the seat.

Repeat until seating allocation does not change.

change_places <- function(visible, input) {
  seating_plan <- input
  floor <- is.na(seating_plan)
  
  for (iter in 1:100) {
    new_seating_plan <- seating_plan
    for (seat in seq_along(visible)) {
      current <- seating_plan[!floor][seat]
      neighbours <- sum(seating_plan[visible[[seat]]])
      if (current & neighbours >= 5) {
        new_seating_plan[!floor][seat] <- 0
      } else if (!current & !neighbours) {
        new_seating_plan[!floor][seat] <- 1
      }
    }
    if (all(seating_plan == new_seating_plan, na.rm = TRUE)) {
      return(sum(seating_plan, na.rm = TRUE))
    }
    seating_plan <- new_seating_plan
  }
  
  stop('Failed to converge after 100 iterations')
}

change_places(line_of_sight, input)

[1] 1944

This was pretty slow, which is to be expected in R. To speed it up, we can rewrite the guts in a lower-level programming language. There may also be some scope for vectorisation.

Day 12 - Rain risk

Complex directions

I was actually expecting this to be more complicated, with turns in arbitrary numbers of degrees. But it turns out that they are all multiples of 90°, so all F instructions can be simply converted into N, E, S or W without invoking trigonometry.

library(tidyr)
library(dplyr)
instructions <- tibble(input = readLines('input12.txt')) %>%
  extract(input, c('direction', 'value'), '(\\w)(\\d+)', convert = TRUE) %>%
  mutate(bearing = cumsum(- value * (direction == 'L')
                          + value * (direction == 'R')),
         bearing = (90 + bearing) %% 360,
         cardinal = ifelse(direction == 'F',
                           c('N', 'E', 'S', 'W')[1 + bearing / 90],
                           direction))

head(instructions)

# A tibble: 6 x 4
  direction value bearing cardinal
              
1 F             8      90 E       
2 N             2      90 N       
3 F            32      90 E       
4 F            17      90 E       
5 E             4      90 E       
6 N             4      90 N       

Now let’s work out where we are. No need to store the latitude and longitude in separate columns; we can add them up as complex numbers and then sum the real and imaginary parts.

instructions %>%
  mutate(east = (cardinal == 'E') - (cardinal == 'W'),
         north = (cardinal == 'N') - (cardinal == 'S')) %>%
  summarise(position = sum(value * (east + north * 1i)),
            distance = abs(Re(position)) + abs(Im(position)))

# A tibble: 1 x 2
  position  distance
          
1 -127-752i      879

Euler’s Bermuda Triangle

Now the N, E, S and W directions store up instructions, which are performed by the ship every time F is invoked.

The value of these instructions is rotated in the complex plane for every L or R turn. Euler’s formula states: $$e^{ix} = \cos x + i\sin x,$$ and we can use this to work out how to transform the relative coordinates of the waypoint to the ship every time there is a turn.

In the complex plane:

  • turning 90° to the right is equivalent to multiplying by $0-i$,
  • turning 90° to the left is equivalent to multiplying by $0+i$,
  • turning 180° is equivalent to multiplying by $-1+0i$,
  • turning 270° is equivalent to turning by -90° so use the rule above.
instructions %>%
  mutate(
    east = value * ((direction == 'E') - (direction == 'W')),
    north = value * ((direction == 'N') - (direction == 'S')),
    radians = value * 2 * pi / 360 * ((direction == 'L') - (direction == 'R')),
    rotate = exp(1i * radians),
    waypoint = (10 + 1i + cumsum((east + north * 1i) * cumprod(1 / rotate))),
  ) %>%
  summarise(position = sum(value * (direction == 'F') * waypoint * cumprod(rotate)),
            distance = abs(Re(position)) + abs(Im(position)))

# A tibble: 1 x 2
  position   distance
           
1 17936-171i   18107.

So that we can take advantage of cumsum and cumprod vectorisation, we rotate the ship rather than the waypoint, then reverse the rotation at the end to get the final position of the ship relative to its starting point.

Day 13 - Shuttle search

Earliest bus

A bus’s ID indicates the interval between departures, starting at time 0.

This is simple modular arithmetic: find the remainder when the timestamp is divided by each bus’s ID/interval, then multiply the smallest such remainder with the corresponding bus’s ID.

But we want the bus to arrive after we start waiting at the bus stop, not before. So we negate the timestamp.

input <- readLines('input13.txt')
timestamp <- as.integer(input[1])
buses <- as.integer(strsplit(input[2], ',')[[1]])

inservice <- buses[!is.na(buses)]
inservice[which.min(-timestamp %% inservice)] * min(-timestamp %% inservice)

[1] 222

Bus cluster

We seek a timepoint $t$ at which our first listed bus arrives, the second bus arrives at time $t+1$, and so on. Thus it must have the following properties:

  • $t \equiv 0 \mod b_0$
  • $t \equiv -1 \mod b_1$
  • $t \equiv -n \mod b_n$

First looking at the examples, we can take this naïve approach:

find_timetable <- function(buses, maxit = 1e5) {
  offsets <- seq_along(buses) - 1
  for (i in seq_len(maxit)) {
    t <- buses[1] * i
    if ( all((-t %% buses) == offsets, na.rm = TRUE) )
      return(t)
  }
  stop('Failed to find a valid t')
}

find_timetable(c(17, NA, 13, 19))

[1] 3417

find_timetable(c(67, 7, 59, 61))

[1] 754018

But this probably isn’t going to scale well. Time to dust off a bit of number theory. By the Chinese remainder theorem, for any $a$, $b$ and coprime $m$, $n$, there exists a unique $x (\mod mn)$ such that $x \equiv a \mod m$ and $x \equiv b \mod n$.

Here $a,b$ are offsets (the position of the bus in the list), $n$ represents a bus ID and $x$ is the solution we seek.

The algorithm will be as follows:

  1. Test values in the sequence $a_1, a_1 + n_1, a_1 + 2n_1, \dots$ to find the first time $x_1$ at which a bus arrives and the second bus arrives 1 minute later.
  2. Test values in the sequence $x_1, x_1 + n_1n_2, x_1 + 2n_1n_2, \dots$ to get a valid time for the first three buses.
  3. Repeat.
sieve <- function(a1, a2, n1, n2, maxit = 1e5) {
  x <- a1 + n1 * (0:maxit)
  x[which.max(x %% n2 == a2 %% n2)]
}

find_timetable2 <- function(buses) {
  offsets <- -(seq_along(buses) - 1)[!is.na(buses)] # a
  buses <- buses[!is.na(buses)]                     # n
  x <- offsets[1]
  for (i in 2:length(buses))
    x <- sieve(x, offsets[i], prod(head(buses, i-1)), buses[i])
  x
}

format(find_timetable2(buses), sci = FALSE)

[1] "408270049879073"

Day 14 - Docking data

Bitmask

The trickiest bit(!) in the first part is reading in the data. I wanted a data frame that ‘remembered’ the value of the last mask set. The other part is converting to and from binary. To help you along the way, R has a function called intToBits, but be careful because it converts to 32 bits and the puzzle is 36-bit.

library(dplyr)
library(tidyr)
intTo36Bits <- function(n) {
  bit32 <- rev(as.character(intToBits(n)))
  c(rep(0, 4), as.integer(bit32))
}

binaryToInt <- function(b) {
  b <- as.integer(strsplit(b, '')[[1]])
  sum(b * 2^rev(seq_along(b) - 1))
}

mask <- function(mask, x) {
  mask <- suppressWarnings(as.integer(strsplit(mask, '')[[1]]))
  x <- as.integer(strsplit(x, '')[[1]])
  x[!is.na(mask)] <- mask[!is.na(mask)]
  paste(x, collapse = '')
}

program <- read.table('input14.txt', sep = '=', strip.white = TRUE,
                      col.names = c('key', 'value')) %>%
  extract(key, c('dest', 'address'), '(mem|mask)\\[?(\\d*)\\]?',
          convert = TRUE) %>%
  mutate(mask = value[which(dest == 'mask')[cumsum(dest == 'mask')]]) %>%
  filter(dest == 'mem') %>%
  select(-dest) %>%
  mutate(value = as.integer(value),
         value_binary = lapply(value, intTo36Bits),
         value_binary = sapply(value_binary, paste, collapse = ''),
         value_masked = mapply(mask, mask, value_binary))

As an example, here is the first value 51331021 being masked to become 62069628301:

value:   000000000011000011110011111111001101
mask:    1110X1110XXX101X0011010X110X10X0110X
result:  111001110011101000110101110110001101

What is the sum of the values in memory? Well, since we are just setting values, the only value we care about is the last one for each address. Whatever values they took before the end are unimportant.

program %>%
  group_by(address) %>%
  summarise(last_integer = binaryToInt(last(value_masked))) %>%
  pull(last_integer) %>% sum %>% format(scientific = FALSE) # 14862056079561

[1] "14862056079561"

Memory address decoder

In part 2, the mask applies to the memory address, not to the value. Thus the same value gets applied to possibly many addresses.

It also helps to read the question properly. I got stuck on this for ages until I eventually noticed the part that says

If the bitmask bit is 0, the corresponding memory address bit is unchanged.

which meant my mask was doing the wrong thing, even before the floating bits.

decode <- function(mask, x) {
  mask <- suppressWarnings(as.integer(strsplit(mask, '')[[1]]))
  x <- as.integer(strsplit(x, '')[[1]])
  x[!is.na(mask) & mask] <- mask[!is.na(mask) & mask] # no change if mask is 0!
  n_floating <- sum(is.na(mask))
  decoded <- c()
  for (i in seq_len(2^n_floating) - 1) {
    x[is.na(mask)] <- tail(intTo36Bits(i), n_floating)
    decoded <- c(decoded, paste(x, collapse = ''))
  }
  decoded
}

program %>%
  select(-value_masked) %>%
  mutate(address_binary = sapply(lapply(address, intTo36Bits), paste, collapse = ''),
         address_decoded = mapply(decode, mask, address_binary)) %>%
  select(address_decoded, value_binary) %>%
  tidyr::unnest_longer(address_decoded) %>%
  group_by(address_decoded) %>%
  summarise(last_integer = binaryToInt(last(value_binary))) %>%
  pull(last_integer) %>% sum %>% format(scientific = FALSE)

[1] "3296185383161"

There are many ways I could have improved this solution. In particular, there wasn’t any actual reason why I needed to compress the binary digits into a string representation between operations—other than making the tables of values easier to read during debugging. I could have stored them as vectors or matrices in list-columns, instead.

Day 15 - Rambunctious recitation

Memory game

The first part is straightforward even with not particularly optimal code:

memory_game <- function(n, start) {
  nstart <- length(start)
  spoken <- integer(10)
  spoken[1:nstart] <- start
  for (i in nstart:(n-1)) {
    before <- which(spoken[1:(i-1)] == spoken[i])
    if (!length(before)) {
      spoken[i+1] <- 0
    } else spoken[i+1] <- i - tail(before, 1)
  }
  spoken[n]
}

memory_game(2020, c(7, 12, 1, 0, 16, 2))

[1] 410

Long-term memory

For the 30 millionth number spoken, it’s probably not very efficient to carry the whole vector with us. How can we make it more efficient? This is ten times faster:

memory_game2 <- function(n, start) {
  nstart <- length(start)
  spoken <- start[-nstart]
  when <- seq_along(spoken)
  current <- start[nstart]
  for (i in nstart:(n-1)) {
    if (!current %in% spoken) {
      next_number <- 0
      spoken <- c(spoken, current)
      when <- c(when, i)
    } else {
      next_number <- i - when[spoken == current]
      when[spoken == current] <- i
    }
    current <- next_number
  }
  current
}

memory_game(2020, c(7, 12, 1, 0, 16, 2))

[1] 410

Unfortunately, that still just isn’t fast enough for the problem we have, partly because it involves growing the size of a large vector instead of fixing its length in advance. We can use direct lookup from a vector instead. Treat the indices of a vector (-1, because R indexes from 1) as the possible spoken numbers, and the values at those indices as the last time that number was spoken, or zero if it has not been said so far.

This vector needs to be of length equal to the number of rounds in the game, i.e. 30 million elements long, which is not very big in the grand scheme of things. (When testing, you should also make sure it is at least as long as the size of the maximum value in the starting numbers.)

memory_game3 <- function(n, start) {
  nstart <- length(start)
  spoken <- numeric(max(n, start) + 1)
  spoken[start[-nstart] + 1] <- seq_len(nstart - 1)
  current <- start[nstart]
  for (i in nstart:(n-1)) {
    next_number <- (spoken[current + 1] > 0) * i - spoken[current + 1]
    spoken[current + 1] <- i
    current <- next_number
  }
  current
}

memory_game3(3e7, c(7, 12, 1, 0, 16, 2))

[1] 238

According to microbenchmark, the second implementation is about 5 times faster than the first, and my final implementation is 10 times faster than that. It takes just a second or two to run. A lower-level implementation in C++ might be even faster, but I already have the star now…


I will update this post as I complete future puzzles.

To leave a comment for the author, please follow the link and comment on their blog: R on Tea & Stats.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)