Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Advent of Code is a series of programming puzzles you can tackle to hone your coding skills each day in the run-up to Christmas.

This year I am attempting it using R, which can make some challenges easier or harder depending on whether they are more ‘computer sciencey’ or more ‘data sciencey’. Generally it makes parsing datasets easier but low-level string manipulation more fiddly.

Here are my solutions so far. Where possible, I’ve tried to strike a balance between efficiency and readability, and to try avoid using the packages I might usually use (e.g. dplyr) if I think it makes the puzzle too easy.

The input data are different for each participant, so your numerical results may differ from mine.

# Day 1 – Report repair

## Two numbers

Find the two entries that sum to 2020, then multiply those two numbers together.

This can be a one-liner:

input <- as.integer(readLines('input01.txt'))
prod(input[(2020 - input) %in% input])

 468051


## Three numbers

Find the three entries that sum to 2020, then multiply them together.

It might be tempting to go for a naïve solution like this:

prod(combn(input, 3)[, combn(input, 3, sum) == 2020])

 272611658


It gives the right answer but involves a fair amount of unnecessary computation. It takes more than a second to run. If we assume all the inputs are non-negative, we can take advantage of this to reduce the number of operations.

. <- expand.grid(input, input[(2020 - input) > min(input)])
. <- transform(., Var3 = 2020 - Var1 - Var2)
. <- subset(., Var3 > min(input))

 190


## Field validation

Thanks to the way we imported the data, this is quite straightforward. The rules are:

• byr (Birth Year) - four digits; at least 1920 and at most 2002.
• iyr (Issue Year) - four digits; at least 2010 and at most 2020.
• eyr (Expiration Year) - four digits; at least 2020 and at most 2030.
• hgt (Height) - a number followed by either cm or in:
• If cm, the number must be at least 150 and at most 193.
• If in, the number must be at least 59 and at most 76.
• hcl (Hair Color) - a # followed by exactly six characters 0-9 or a-f.
• ecl (Eye Color) - exactly one of: amb blu brn gry grn hzl oth.
• pid (Passport ID) - a nine-digit number, including leading zeroes.
• cid (Country ID) - ignored, missing or not.

The data are all different types (integer, double and categorical) so the first step will be to spread the table to a wider format, with one row per passport, and one column for each field.

Here is a dplyr + tidyr solution.

library(dplyr)
library(tidyr)
passports_wide <- passports %>%
pivot_wider(names_from = key, values_from = value) %>%
mutate(byr = as.integer(byr),
iyr = as.integer(iyr),
eyr = as.integer(eyr),
hgt_value = as.numeric(gsub('cm|in$', '', hgt)), hgt_unit = gsub('\\d*', '', hgt)) head(passports_wide) # A tibble: 6 x 11 id iyr cid pid eyr hcl ecl byr hgt hgt_value hgt_unit <int> <int> <chr> <chr> <int> <chr> <chr> <int> <chr> <dbl> <chr> 1 0 1928 150 4761132~ 2039 a5ac0f #25f8~ 2027 190 190 "" 2 1 2013 169 9200769~ 2026 #fffffd hzl 1929 168cm 168 "cm" 3 2 2011 <NA> 3284128~ 2023 #6b5442 brn 1948 156cm 156 "cm" 4 3 2019 279 6749079~ 2020 #602927 amb 1950 189cm 189 "cm" 5 4 2015 <NA> 4736300~ 2022 #341e13 hzl 1976 178cm 178 "cm" 6 5 2020 <NA> 6281139~ 2023 #866857 blu 1984 163cm 163 "cm"  From here, we can filter out the invalid entries, using filter or subset. passports_wide %>% filter(byr >= 1920, byr <= 2002, iyr >= 2010, iyr <= 2020, eyr >= 2020, eyr <= 2030, hgt_value >= 150 & hgt_value <= 193 & hgt_unit == 'cm' | hgt_value >= 59 & hgt_value <= 76 & hgt_unit == 'in', grepl('^#[0-9a-f]{6}$', hcl),
ecl %in% c('amb', 'blu', 'brn', 'gry', 'grn', 'hzl', 'oth'),

 6551


## Questions with all ‘yes’

Similar, but now an intersection set operation.

intersection <- aggregate(questions ~ group, responses,
function(x) length(Reduce(intersect, strsplit(x, ''))))
sum(intersection$questions)  3358  The solution to the first part could have used Reduce(union, ...), which would achieve the same result as unique(unlist(...)). Both of these could be made a bit more readable using dplyr or data.table instead. In particular, the base function aggregate doesn’t like list-columns as inputs, so the strsplit can’t be done before the aggregation. This is not a problem with dplyr::summarise or data.table: library(dplyr) responses %>% mutate(questions = strsplit(questions, '')) %>% group_by(group) %>% summarise(count = Reduce(intersect, questions) %>% length) %>% pull(count) %>% sum  3358 library(data.table) setDT(responses)[, questions := strsplit(questions, '')] responses[, .(count = length(Reduce(intersect, questions))), by = group][, sum(count)]  3358  # Day 7 - Handy haversacks ## Number of bag colours Given an input list of rules, how many different colours of bags may contain at least one shiny gold bag? The first step will be to parse the natural language input, which looks like this: input <- readLines('input07.txt') head(input)  "mirrored silver bags contain 4 wavy gray bags."  "clear tan bags contain 5 bright purple bags, 1 pale black bag, 5 muted lime bags."  "dim crimson bags contain 5 vibrant salmon bags, 2 clear cyan bags, 2 striped lime bags, 5 vibrant violet bags."  "mirrored beige bags contain 4 pale gold bags, 1 pale aqua bag."  "pale maroon bags contain 2 dotted orange bags."  "dim tan bags contain no other bags."  For this first exercise, the numbers of bags within each one are irrelevant (but we will need them later for part 2). For now, we just want to reduce it to which colours can contain which others. To start, I tidied up the data into a flat data frame. This isn’t strictly necessary—a named list would work too—but it’s easy to keep track of everything in a flat data structure. library(tidyr) rules <- strsplit(input, ' contain ') %>% lapply(gsub, pattern = '\\.| bags?', replacement = '') %>% do.call(rbind, .) %>% as.data.frame %>% setNames(c('container', 'content')) %>% transform(content = strsplit(content, ', ')) %>% unnest_longer(content) %>% extract(content, c('number', 'content'), '(\\d+) (.+)') %>% transform(number = as.numeric(number)) %>% transform(number = replace(number, is.na(number), 0)) head(rules) container number content 1 mirrored silver 4 wavy gray 2 clear tan 5 bright purple 3 clear tan 1 pale black 4 clear tan 5 muted lime 5 dim crimson 5 vibrant salmon 6 dim crimson 2 clear cyan  The algorithm is a queue, which works as follows. 1. Look up which bags can directly contain shiny gold 2. Look up which bags can directly contain the results of 1. 3. Repeat until no more bags can contain the result. Here’s the loop: bag <- 'shiny gold' containers <- NULL repeat { contained_in <- subset(rules, content %in% bag) if ( !nrow(contained_in) ) break bag <- setdiff(contained_in$container, containers)
containers <- union(containers, bag)
}

length(containers)

 259


## Number of individual bags

We ignored the numbers of bags in part 1, but we need them, now. How many individual bags fit inside a single shiny gold bag?

To understand recursion, you must first understand recursion. My function, count_bag, calls itself. In more loop-friendly languages you might use a queue for this second part, but I can’t really think of a concise way to do it using R.

count_bag <- function(colour, factor = 1) {
stopifnot(length(colour) == 1)

rule <- subset(rules, container == colour)

if (nrow(rule) == 1 & rule$number == 0) { out <- 0 } else { # need to work row-wise or you'll come unstuck: out <- mapply(count_bag, rule$content, rule$number) } factor * (1 + sum(out)) }  We remove 1 at the end so as not to include the shiny gold bag itself: count_bag('shiny gold', 1) - 1  45018  # Day 8 - Handheld halting ## Infinite loop Just a simple loop that keeps track of all the places it has been so far, and terminates the moment it visits a location for the second time. input <- read.table('input08.txt', col.names = c('instr', 'value')) acc <- input$visited <- 0
i <- 1

repeat {
input$visited[i] <- input$visited[i] + 1

if ( any(input$visited > 1) ) break acc <- acc + input$value[i] * (input$instr[i] == 'acc') i <- i + (input$instr[i] == 'jmp') * (input$value[i] - 1) + 1 } acc  1600  Originally I wrote this with nested if statements, then changed it to binary multiplication, for fewer lines of code, at the expense of readability. This puzzle is set up to catch you out. From seeing nop +0 in the example data you might be tempted to assume that adding the value on nop instructions won’t affect the accumulator. But the test input data have some non-zero nop values thrown in, that you will surely encounter: head(subset(input, instr == 'nop' & value != 0 & visited > 0)) instr value visited 2 nop 631 1 11 nop 83 2 71 nop 168 1 73 nop 151 1 96 nop -25 1 123 nop -9 1  Thus you must only jump or add to the accumulator on instructions that are explicitly jmp or acc, respectively. ## Corrupted code From part 1, we already have an algorithm for finding the first instruction that will lead into an infinite loop. Instead of terminating at this point, we can assume that last instruction was corrupted, swap it for the other type, then continue until we find another such corruption, all the way until the program is able to terminate on its own. That was the idea, anyway. Then I got fed up and decided to brute force it, instead. Maybe there is a subtler way, but this appears to work quickly enough. One thing worth noting is that you only need to look at those indices already visited in part 1. nops_and_jmps <- which(input$instr != 'acc' & input$visited) brute_force <- function() { for (nj in nops_and_jmps) { modified <- input modified$instr[nj] <- setdiff(c('nop', 'jmp'), input$instr[nj]) acc <- modified$visited <- 0
i <- 1
repeat {

if (i == nrow(input) + 1)
return(acc)

modified$visited[i] <- modified$visited[i] + 1

if ( any(modified$visited > 1) ) break acc <- acc + modified$value[i] * (modified$instr[i] == 'acc') i <- i + (modified$instr[i] == 'jmp') * (modified$value[i] - 1) + 1 } } } brute_force()  1543  # Day 9 - Encoding error ## Adding pairs Here the question is how to calculate the sums of pairs of values in a sliding window, ideally without redundantly computing the same sums more than once. find_error <- function(series, N = 25) { preamble <- head(series, N) t(combn(preamble, 2)) ->.; addmargins(., 2) ->.; as.data.frame(.) -> pairs for (x in tail(series, -N)) { if (!x %in% pairs$Sum)
return(x)

pairs <- subset(pairs, V1 != preamble & V2 != preamble)
pairs <- rbind(pairs, data.frame(V1 = x,
V2 = preamble[-1],
Sum = x + preamble[-1]))
preamble <- c(preamble[-1], x)
}
}


John Mount recently pointed out that there is already a ‘pipe’ of sorts in base R, which you can construct using an operator of the form ->.;. I use it on the second line of this function just because converting combn output into a long data frame format is a bit verbose.

To check our working, run on the example dataset:

example <- c(35, 20, 15, 25, 47, 40, 62, 55, 65, 95, 102, 117, 150, 182, 127,
219, 299, 277, 309, 576)
find_error(example, 5)

 127


And now with the real input data. As in earlier exercises, we need floating point numbers, rather than integers, because the large numbers in the real input can cause an integer overflow.

input <- as.double(readLines('input09.txt'))
(invalid <- find_error(input))

 542529149


## Contiguous set

To find the longest contiguous set of numbers that add up to the value above, we first recognise that the values are non-negative, so we can immediately exclude any elements that are after our target invalid element.

My procedure will then go as follows. We first initialize an empty set. Then, iterating backwards through the series:

1. Add up all the values in the current set.
2. If the sum is greater than the target, delete the last element.
3. If the sum is equal to the target, and the current set larger than our current best set (initially empty), save this as our best so far.
4. Prepend the current set with the next element in the sequence.
5. Repeat 1–4 until you reach the beginning of the series.

In R code form:

contiguous_set <- function(series, target) {
series <- head(series, which.max(series == target) - 1)
best_set <- set <- c()
for (n in rev(series)) {
if (sum(set) == target & length(set) >= length(best_set))
best_set <- set

if (sum(set) > target)

set <- c(n, set)
}

sum(range(best_set))
}


On our example dataset we get:

contiguous_set(example, 127)

 62


And on the test dataset, using the value stored from part 1:

contiguous_set(input, invalid)

 75678618


# Day 10 - Adapter array

## Lagged differences

This is pretty trivial. Read in the data, append a zero, sort the numbers, compute the lagged differences (appending a 3), tabulate them and multiply the result.

input <- as.integer(readLines('input10.txt'))
jolts <- c(0, sort(input), max(input) + 3)
prod(table(diff(jolts)))

 2170


I could have equally appended the 3 in the second line as max(input) + 3.

## Counting combinations

Again we will be working on the lagged differences. Let’s look at a few values from this sequence.

head(diff(jolts), 20)

 1 1 1 3 3 1 1 1 3 1 1 1 1 3 3 1 1 1 1 3


We are interested in the lengths of the sub-sequences of 1s in this series. The R function rle will give the run-length encoding, i.e. the lengths of subsequences of consecutive equal values in our vector.

For a length-n subsequence of differences equal to 1:

• if n = 1 the adapter can’t be removed because the gap would then be 4:
• ${0 \choose 0} = 1$
• if n = 2 then you can only remove the first adapter (or not):
• ${1 \choose 0} + {1 \choose 1} = 2$
• if n = 3 you can keep them all, remove 1 or both of the first 2:
• ${2 \choose 0} + {2 \choose 1} + {2 \choose 2} = 4$
• if n = 4 you can keep them all, or remove up to 2 of the first 3:
• ${3 \choose 0} + {3 \choose 1} + {3 \choose 2} = 7$
• and so on (though actually there aren’t any sequences longer than 4)

Then multiply all these numbers of combinations together for every subsequence.

sequences <- as.data.frame(unclass(rle(diff(jolts))))
sequences <- subset(sequences, values == 1)
count_combos <- function(n) sum( choose(n-1, 0:2) )
sequences <- transform(sequences, combos = sapply(lengths, count_combos))
prod(sequences$combos)  2.480359e+13  We probably don’t want scientific notation, so reformat the result: format(prod(sequences$combos), scientific = FALSE)

 "24803586664192"


# Day 11 - Seating system

## Convoluted solution

This puzzle is effectively applying a convolution matrix (the set of rules) to a 2-dimensional image (the seating plan).

We can import the data as a logical (binary) matrix where zero or FALSE means a seat is empty, and one or TRUE means it is occupied. Floor space is set to NA.

input <- do.call(rbind, strsplit(readLines('input11.txt'), ''))
input <- input != 'L'
input[input > 0] <- NA


The rules are:

• If a seat is empty (L) and there are no occupied seats adjacent to it, the seat becomes occupied.
• If a seat is occupied (#) and four or more seats adjacent to it are also occupied, the seat becomes empty.
• Otherwise, the seat’s state does not change.

A convolution kernel matrix can therefore be of the form:

$$\begin{bmatrix}-1 & -1 & -1 \\-1 & 3 & -1 \\-1 & -1 & -1\end{bmatrix}$$

Which is followed by the filter:

• If the result is zero or more, then occupy the seat (set equal to TRUE or 1)
• Otherwise, empty the seat (set equal to FALSE or 0)
• If a cell is meant to be floor space, reset to zero (because OpenImageR is not currently written to handle NAs).
kernel <- matrix(c(rep(-1, 4), 3, rep(-1, 4)), 3, 3)

convoluted_seating <- function(input, kernel) {
seats <- replace(input, is.na(input), 0)

for (i in 1:100) {
convolved <- OpenImageR::convolution(seats, kernel, mode = 'same')
new_seats <- replace(convolved >= 0, is.na(input), 0)

if ( all(seats == new_seats) )
return(sum(new_seats))

seats <- new_seats
}

stop('Failed to converge after 100 iterations')
}

convoluted_seating(input, kernel)

 2194


This took 95 iterations.

## Line of sight

In the first part, floor space was just treated like a seat that nobody sits in. Now, we have to change our convolution matrix for each pixel such that, if there is no seat in one direction, we cast our gaze further and borrow the state of a more distant seat.

Unfortunately most image analysis packages only accept a constant matrix as the kernel argument, rather than a function, so we shall have to roll our own.

Firstly, we run an algorithm to determine which seats are visible. This only needs to be run once.

For each seat:

1. Set radius equal to 1.
2. Look in each of the eight directions for a seat. Is a seat visible?
3. For any direction where this is not true, increase radius by 1.
4. Repeat 2–3 until a visible seat is recorded for every direction.

Today I discovered that which() has an extra argument arr.ind that, if TRUE, returns matrix indices. Handy for quickly converting a matrix into a long (possibly sparse) representation.

seat_ids <- which(!is.na(input), arr.ind = TRUE)
dirs <- subset(expand.grid(down = -1:1, right = -1:1), down | right)

if (!nrow(directions))
return(NULL)

i <- seat + radius * directions[['down']]
j <- seat + radius * directions[['right']]

in_bounds <- i > 0 & i <= nrow(input) & j > 0 & j <= ncol(input)
ij <- cbind(i, j)[in_bounds, , drop = FALSE]

seat_exists <- !is.na( input[ij] )
remaining_dirs <- directions[in_bounds, ][!seat_exists, ]
visible <- unname(ij[seat_exists, , drop = FALSE])

}

line_of_sight <- apply(seat_ids, 1, radial_search, directions = dirs)


Next we run the seat changing algorithm itself. For each seat:

1. Add up the number of occupied seats visible from this one.
2. If sum is zero and seat is unoccupied, occupy the seat.
3. Else if sum is ≥ 5 and seat is occupied, empty the seat.

Repeat until seating allocation does not change.

change_places <- function(visible, input) {
seating_plan <- input
floor <- is.na(seating_plan)

for (iter in 1:100) {
new_seating_plan <- seating_plan
for (seat in seq_along(visible)) {
current <- seating_plan[!floor][seat]
neighbours <- sum(seating_plan[visible[[seat]]])
if (current & neighbours >= 5) {
new_seating_plan[!floor][seat] <- 0
} else if (!current & !neighbours) {
new_seating_plan[!floor][seat] <- 1
}
}
if (all(seating_plan == new_seating_plan, na.rm = TRUE)) {
return(sum(seating_plan, na.rm = TRUE))
}
seating_plan <- new_seating_plan
}

stop('Failed to converge after 100 iterations')
}

change_places(line_of_sight, input)

 1944


This was pretty slow, which is to be expected in R. To speed it up, we can rewrite the guts in a lower-level programming language. There may also be some scope for vectorisation.

# Day 12 - Rain risk

## Complex directions

I was actually expecting this to be more complicated, with turns in arbitrary numbers of degrees. But it turns out that they are all multiples of 90°, so all F instructions can be simply converted into N, E, S or W without invoking trigonometry.

library(tidyr)
library(dplyr)
instructions <- tibble(input = readLines('input12.txt')) %>%
extract(input, c('direction', 'value'), '(\\w)(\\d+)', convert = TRUE) %>%
mutate(bearing = cumsum(- value * (direction == 'L')
+ value * (direction == 'R')),
bearing = (90 + bearing) %% 360,
cardinal = ifelse(direction == 'F',
c('N', 'E', 'S', 'W')[1 + bearing / 90],
direction))

# A tibble: 6 x 4
direction value bearing cardinal
<chr>     <int>   <dbl> <chr>
1 F             8      90 E
2 N             2      90 N
3 F            32      90 E
4 F            17      90 E
5 E             4      90 E
6 N             4      90 N


Now let’s work out where we are. No need to store the latitude and longitude in separate columns; we can add them up as complex numbers and then sum the real and imaginary parts.

instructions %>%
mutate(east = (cardinal == 'E') - (cardinal == 'W'),
north = (cardinal == 'N') - (cardinal == 'S')) %>%
summarise(position = sum(value * (east + north * 1i)),
distance = abs(Re(position)) + abs(Im(position)))

# A tibble: 1 x 2
position  distance
<cpl>        <dbl>
1 -127-752i      879


## Euler’s Bermuda Triangle

Now the N, E, S and W directions store up instructions, which are performed by the ship every time F is invoked.

The value of these instructions is rotated in the complex plane for every L or R turn. Euler’s formula states: $$e^{ix} = \cos x + i\sin x,$$ and we can use this to work out how to transform the relative coordinates of the waypoint to the ship every time there is a turn.

In the complex plane:

• turning 90° to the right is equivalent to multiplying by $0-i$,
• turning 90° to the left is equivalent to multiplying by $0+i$,
• turning 180° is equivalent to multiplying by $-1+0i$,
• turning 270° is equivalent to turning by -90° so use the rule above.
instructions %>%
mutate(
east = value * ((direction == 'E') - (direction == 'W')),
north = value * ((direction == 'N') - (direction == 'S')),
radians = value * 2 * pi / 360 * ((direction == 'L') - (direction == 'R')),
waypoint = (10 + 1i + cumsum((east + north * 1i) * cumprod(1 / rotate))),
) %>%
summarise(position = sum(value * (direction == 'F') * waypoint * cumprod(rotate)),
distance = abs(Re(position)) + abs(Im(position)))

# A tibble: 1 x 2
position   distance
<cpl>         <dbl>
1 17936-171i   18107.


So that we can take advantage of cumsum and cumprod vectorisation, we rotate the ship rather than the waypoint, then reverse the rotation at the end to get the final position of the ship relative to its starting point.

# Day 13 - Shuttle search

## Earliest bus

A bus’s ID indicates the interval between departures, starting at time 0.

This is simple modular arithmetic: find the remainder when the timestamp is divided by each bus’s ID/interval, then multiply the smallest such remainder with the corresponding bus’s ID.

But we want the bus to arrive after we start waiting at the bus stop, not before. So we negate the timestamp.

input <- readLines('input13.txt')
timestamp <- as.integer(input)
buses <- as.integer(strsplit(input, ',')[])

inservice <- buses[!is.na(buses)]
inservice[which.min(-timestamp %% inservice)] * min(-timestamp %% inservice)

 222


## Bus cluster

We seek a timepoint $t$ at which our first listed bus arrives, the second bus arrives at time $t+1$, and so on. Thus it must have the following properties:

• $t \equiv 0 \mod b_0$
• $t \equiv -1 \mod b_1$
• $t \equiv -n \mod b_n$

First looking at the examples, we can take this naïve approach:

find_timetable <- function(buses, maxit = 1e5) {
offsets <- seq_along(buses) - 1
for (i in seq_len(maxit)) {
t <- buses * i
if ( all((-t %% buses) == offsets, na.rm = TRUE) )
return(t)
}
stop('Failed to find a valid t')
}

find_timetable(c(17, NA, 13, 19))

 3417

find_timetable(c(67, 7, 59, 61))

 754018


But this probably isn’t going to scale well. Time to dust off a bit of number theory. By the Chinese remainder theorem, for any $a$, $b$ and coprime $m$, $n$, there exists a unique $x (\mod mn)$ such that $x \equiv a \mod m$ and $x \equiv b \mod n$.

Here $a,b$ are offsets (the position of the bus in the list), $n$ represents a bus ID and $x$ is the solution we seek.

The algorithm will be as follows:

1. Test values in the sequence $a_1, a_1 + n_1, a_1 + 2n_1, \dots$ to find the first time $x_1$ at which a bus arrives and the second bus arrives 1 minute later.
2. Test values in the sequence $x_1, x_1 + n_1n_2, x_1 + 2n_1n_2, \dots$ to get a valid time for the first three buses.
3. Repeat.
sieve <- function(a1, a2, n1, n2, maxit = 1e5) {
x <- a1 + n1 * (0:maxit)
x[which.max(x %% n2 == a2 %% n2)]
}

find_timetable2 <- function(buses) {
offsets <- -(seq_along(buses) - 1)[!is.na(buses)] # a
buses <- buses[!is.na(buses)]                     # n
x <- offsets
for (i in 2:length(buses))
x <- sieve(x, offsets[i], prod(head(buses, i-1)), buses[i])
x
}

format(find_timetable2(buses), sci = FALSE)

 "408270049879073"


# Day 14 - Docking data

The trickiest bit(!) in the first part is reading in the data. I wanted a data frame that ‘remembered’ the value of the last mask set. The other part is converting to and from binary. To help you along the way, R has a function called intToBits, but be careful because it converts to 32 bits and the puzzle is 36-bit.

library(dplyr)
library(tidyr)
intTo36Bits <- function(n) {
bit32 <- rev(as.character(intToBits(n)))
c(rep(0, 4), as.integer(bit32))
}

binaryToInt <- function(b) {
b <- as.integer(strsplit(b, '')[])
sum(b * 2^rev(seq_along(b) - 1))
}

x <- as.integer(strsplit(x, '')[])
paste(x, collapse = '')
}

program <- read.table('input14.txt', sep = '=', strip.white = TRUE,
col.names = c('key', 'value')) %>%
extract(key, c('dest', 'address'), '(mem|mask)\$?(\\d*)\$?',
convert = TRUE) %>%
filter(dest == 'mem') %>%
select(-dest) %>%
mutate(value = as.integer(value),
value_binary = lapply(value, intTo36Bits),
value_binary = sapply(value_binary, paste, collapse = ''),


As an example, here is the first value 51331021 being masked to become 62069628301:

value:   000000000011000011110011111111001101
result:  111001110011101000110101110110001101


What is the sum of the values in memory? Well, since we are just setting values, the only value we care about is the last one for each address. Whatever values they took before the end are unimportant.

program %>%
pull(last_integer) %>% sum %>% format(scientific = FALSE) # 14862056079561

 "14862056079561"


In part 2, the mask applies to the memory address, not to the value. Thus the same value gets applied to possibly many addresses.

It also helps to read the question properly. I got stuck on this for ages until I eventually noticed the part that says

If the bitmask bit is 0, the corresponding memory address bit is unchanged.

which meant my mask was doing the wrong thing, even before the floating bits.

decode <- function(mask, x) {
x <- as.integer(strsplit(x, '')[])
decoded <- c()
for (i in seq_len(2^n_floating) - 1) {
decoded <- c(decoded, paste(x, collapse = ''))
}
decoded
}

program %>%
summarise(last_integer = binaryToInt(last(value_binary))) %>%
pull(last_integer) %>% sum %>% format(scientific = FALSE)

 "3296185383161"


There are many ways I could have improved this solution. In particular, there wasn’t any actual reason why I needed to compress the binary digits into a string representation between operations—other than making the tables of values easier to read during debugging. I could have stored them as vectors or matrices in list-columns, instead.

# Day 15 - Rambunctious recitation

## Memory game

The first part is straightforward even with not particularly optimal code:

memory_game <- function(n, start) {
nstart <- length(start)
spoken <- integer(10)
spoken[1:nstart] <- start
for (i in nstart:(n-1)) {
before <- which(spoken[1:(i-1)] == spoken[i])
if (!length(before)) {
spoken[i+1] <- 0
} else spoken[i+1] <- i - tail(before, 1)
}
spoken[n]
}

memory_game(2020, c(7, 12, 1, 0, 16, 2))

 410


## Long-term memory

For the 30 millionth number spoken, it’s probably not very efficient to carry the whole vector with us. How can we make it more efficient? This is ten times faster:

memory_game2 <- function(n, start) {
nstart <- length(start)
spoken <- start[-nstart]
when <- seq_along(spoken)
current <- start[nstart]
for (i in nstart:(n-1)) {
if (!current %in% spoken) {
next_number <- 0
spoken <- c(spoken, current)
when <- c(when, i)
} else {
next_number <- i - when[spoken == current]
when[spoken == current] <- i
}
current <- next_number
}
current
}

memory_game(2020, c(7, 12, 1, 0, 16, 2))

 410


Unfortunately, that still just isn’t fast enough for the problem we have, partly because it involves growing the size of a large vector instead of fixing its length in advance. We can use direct lookup from a vector instead. Treat the indices of a vector (-1, because R indexes from 1) as the possible spoken numbers, and the values at those indices as the last time that number was spoken, or zero if it has not been said so far.

This vector needs to be of length equal to the number of rounds in the game, i.e. 30 million elements long, which is not very big in the grand scheme of things. (When testing, you should also make sure it is at least as long as the size of the maximum value in the starting numbers.)

memory_game3 <- function(n, start) {
nstart <- length(start)
spoken <- numeric(max(n, start) + 1)
spoken[start[-nstart] + 1] <- seq_len(nstart - 1)
current <- start[nstart]
for (i in nstart:(n-1)) {
next_number <- (spoken[current + 1] > 0) * i - spoken[current + 1]
spoken[current + 1] <- i
current <- next_number
}
current
}

memory_game3(3e7, c(7, 12, 1, 0, 16, 2))

 238


According to microbenchmark, the second implementation is about 5 times faster than the first, and my final implementation is 10 times faster than that. It takes just a second or two to run. A lower-level implementation in C++ might be even faster, but I already have the star now…

I will update this post as I complete future puzzles.