# Advent of Code 2020

**R on Tea & Stats**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Advent of Code is a series of programming puzzles you can tackle to hone your coding skills each day in the run-up to Christmas.

This year I am attempting it using R, which can make some challenges easier or harder depending on whether they are more ‘computer sciencey’ or more ‘data sciencey’. Generally it makes parsing datasets easier but low-level string manipulation more fiddly.

Here are my solutions so far. Where possible, I’ve tried to strike a
balance between efficiency and readability, and to avoid using the
packages I might usually use (e.g. `dplyr`

) if I think it makes the
puzzle too easy.

The input data are different for each participant, so your numerical results may differ from mine.

# Day 1 – Report repair

## Two numbers

Find the two entries that sum to 2020, then multiply those two numbers together.

This can be a one-liner:

input <- as.integer(readLines('input01.txt')) prod(input[(2020 - input) %in% input]) [1] 468051

## Three numbers

Find the three entries that sum to 2020, then multiply them together.

It might be tempting to go for a naïve solution like this:

prod(combn(input, 3)[, combn(input, 3, sum) == 2020]) [1] 272611658

It gives the right answer but involves a fair amount of unnecessary computation. It takes more than a second to run. If we assume all the inputs are non-negative, we can take advantage of this to reduce the number of operations.

. <- expand.grid(input, input[(2020 - input) > min(input)]) . <- transform(., Var3 = 2020 - Var1 - Var2) . <- subset(., Var3 > min(input)) prod(.[which.max(.$Var3 %in% input), ]) [1] 272611658

This is approximately 2000 times faster than the one-liner, and works by
successively discarding values that could only add up to more than 2020.
The `.`

notation is just so I can write this without using `dplyr`

.

# Day 2 - Password philosophy

## Number of letters

How many passwords are valid according to the policies?

1-3 a: abcde 1-3 b: cdefg 2-9 c: ccccccccc

First read in the data. I like data frames and so should you.

input <- readLines('input02.txt') passwords <- do.call(rbind, strsplit(input, '[- ]|\\: ')) passwords <- setNames(as.data.frame(passwords), c('min', 'max', 'letter', 'password')) passwords <- transform(passwords, min = as.integer(min), max = as.integer(max)) head(passwords) min max letter password 1 14 15 v vdvvvvvsvvvvvfpv 2 3 11 k kkqkkfkkvkgfknkx 3 6 10 j jjjjjjjjjj 4 5 10 s nskdmzwrmpmhsrzts 5 13 15 v vvvvvvkvvvvjzvv 6 11 13 h hhhhhbhhhhdhhh

String operations are a bit of a pain in base R so it’s easier just to
use a package, like `stringi`

or `stringr`

for this.

with(passwords, { n <- stringr::str_count(password, letter) sum(n >= min & n <= max) }) [1] 625

You could also split each password with `strsplit`

and count the letters
with an `sapply`

-type loop.

## Position of letters

Now the two digits describe two indices in the password, *exactly one*
of which must match the given letter.

with(passwords, sum(xor(substr(password, min, min) == letter, substr(password, max, max) == letter)) ) [1] 391

Initially I got caught out here, by misreading the question as ‘at least
one’ and then wondering why an inclusive or (`|`

) was returning the
incorrect answer.

# Day 3 - Toboggan trajectory

The input looks a bit like this:

..##.........##.........##.........##.........##.........##....... ---> #...#...#..#...#...#..#...#...#..#...#...#..#...#...#..#...#...#.. .#....#..#..#....#..#..#....#..#..#....#..#..#....#..#..#....#..#. ..#.#...#.#..#.#...#.#..#.#...#.#..#.#...#.#..#.#...#.#..#.#...#.# .#...##..#..#...##..#..#...##..#..#...##..#..#...##..#..#...##..#. ..#.##.......#.##.......#.##.......#.##.......#.##.......#.##..... ---> .#.#.#....#.#.#.#....#.#.#.#....#.#.#.#....#.#.#.#....#.#.#.#....# .#........#.#........#.#........#.#........#.#........#.#........# #.##...#...#.##...#...#.##...#...#.##...#...#.##...#...#.##...#... #...##....##...##....##...##....##...##....##...##....##...##....# .#..#...#.#.#..#...#.#.#..#...#.#.#..#...#.#.#..#...#.#.#..#...#.# --->

## Encountering trees

Starting at the top-left corner of your map and following a slope of right 3 and down 1, how many trees would you encounter?

input <- readLines('input03.txt')

A complicated-sounding problem but the solution is mainly mathematical.

positions <- (3 * (seq_along(input) - 1)) %% nchar(input) + 1 sum(substr(input, positions, positions) == '#') [1] 268

The sequence of positions goes 1, 4, 7, …, and when it reaches the edge of the map, loops back round to the beginning. Using the modulo operator we can use the sequence modulo the width of the input map, then add one because R indexes from one rather than from zero.

## Different slopes

Simply wrap the above into a function.

trees <- function(right, down = 1) { vertical <- seq(0, length(input) - 1, by = down) + 1 horizontal <- (right * (seq_along(input) - 1)) %% nchar(input) + 1 horizontal <- head(horizontal, length(vertical)) as.double( sum(substr(input[vertical], horizontal, horizontal) == '#') ) } trees(1) * trees(3) * trees(5) * trees(7) * trees(1, 2) [1] 3093068400

The `as.double`

bit is necessary only because multiplying large integer
outputs together can cause an overflow when the product is larger than
10^{9}.

# Day 4 - Passport processing

The example input is in this ragged format, where keys and values are separated by colons and records are separated by double newlines. The first step is to parse this unusual data format.

ecl:gry pid:860033327 eyr:2020 hcl:#fffffd byr:1937 iyr:2017 cid:147 hgt:183cm iyr:2013 ecl:amb cid:350 eyr:2023 pid:028048884 hcl:#cfa07d byr:1929 hcl:#ae17e1 iyr:2013 eyr:2024 ecl:brn pid:760753108 byr:1931 hgt:179cm hcl:#cfa07d eyr:2025 pid:166559648 iyr:2011 ecl:brn hgt:59in input <- strsplit(readLines('input04.txt'), ' ') ids = cumsum(!lengths(input)) pairs <- lapply(strsplit(unlist(input), ':'), setNames, c('key', 'value')) passports <- data.frame(id = rep(ids, lengths(input)), do.call(rbind, pairs))

## Missing fields

Now the data are in a standard format, this is a simple
split-apply-combine operation. I am using the base `aggregate`

but this
could be done equally well using `dplyr`

or `data.table`

.

required <- c('byr', 'iyr', 'eyr', 'hgt', 'hcl', 'ecl', 'pid') valid <- aggregate(key ~ id, passports, function(x) !length(setdiff(required, x))) head(valid, 10) id key 1 0 TRUE 2 1 TRUE 3 2 TRUE 4 3 TRUE 5 4 TRUE 6 5 TRUE 7 6 TRUE 8 7 FALSE 9 8 FALSE 10 9 TRUE

Then the answer is simply

sum(valid$key) [1] 190

## Field validation

Thanks to the way we imported the data, this is quite straightforward. The rules are:

`byr`

(Birth Year) - four digits; at least 1920 and at most 2002.`iyr`

(Issue Year) - four digits; at least 2010 and at most 2020.`eyr`

(Expiration Year) - four digits; at least 2020 and at most 2030.`hgt`

(Height) - a number followed by either cm or in:- If
`cm`

, the number must be at least 150 and at most 193. - If
`in`

, the number must be at least 59 and at most 76.

- If
`hcl`

(Hair Color) - a # followed by exactly six characters`0-9`

or`a-f`

.`ecl`

(Eye Color) - exactly one of:`amb`

`blu`

`brn`

`gry`

`grn`

`hzl`

`oth`

.`pid`

(Passport ID) - a nine-digit number, including leading zeroes.`cid`

(Country ID) - ignored, missing or not.

The data are all different types (integer, double and categorical) so the first step will be to spread the table to a wider format, with one row per passport, and one column for each field.

Here is a `dplyr`

+ `tidyr`

solution.

library(dplyr) library(tidyr) passports_wide <- passports %>% pivot_wider(names_from = key, values_from = value) %>% mutate(byr = as.integer(byr), iyr = as.integer(iyr), eyr = as.integer(eyr), hgt_value = as.numeric(gsub('cm|in$', '', hgt)), hgt_unit = gsub('\\d*', '', hgt)) head(passports_wide) # A tibble: 6 x 11 id iyr cid pid eyr hcl ecl byr hgt hgt_value hgt_unit <int> <int> <chr> <chr> <int> <chr> <chr> <int> <chr> <dbl> <chr> 1 0 1928 150 4761132~ 2039 a5ac0f #25f8~ 2027 190 190 "" 2 1 2013 169 9200769~ 2026 #fffffd hzl 1929 168cm 168 "cm" 3 2 2011 <NA> 3284128~ 2023 #6b5442 brn 1948 156cm 156 "cm" 4 3 2019 279 6749079~ 2020 #602927 amb 1950 189cm 189 "cm" 5 4 2015 <NA> 4736300~ 2022 #341e13 hzl 1976 178cm 178 "cm" 6 5 2020 <NA> 6281139~ 2023 #866857 blu 1984 163cm 163 "cm"

From here, we can filter out the invalid entries, using `filter`

or
`subset`

.

passports_wide %>% filter(byr >= 1920, byr <= 2002, iyr >= 2010, iyr <= 2020, eyr >= 2020, eyr <= 2030, hgt_value >= 150 & hgt_value <= 193 & hgt_unit == 'cm' | hgt_value >= 59 & hgt_value <= 76 & hgt_unit == 'in', grepl('^#[0-9a-f]{6}$', hcl), ecl %in% c('amb', 'blu', 'brn', 'gry', 'grn', 'hzl', 'oth'), grepl('^\\d{9}$', pid)) -> valid_passports nrow(valid_passports) [1] 121

You could also use a filtering join, though since most of the fields are
ranges of integer values, you would want to use a `data.table`

*non-equi-join* rather than a simple `semi_join`

.

# Day 5 - Binary boarding

## Highest seat ID

This task is easy, as soon as you recognise that it is just converting
numbers from binary to decimal, where `F`

and `L`

denote ones and `B`

and `R`

are zeros. The distinction between rows and columns is a red
herring, because you can parse the whole sequence at once.

input <- readLines('input05.txt') binary <- lapply(strsplit(input, ''), grepl, pattern = '[BR]') seat_ids <- sapply(binary, function(x) sum(x * 2^(rev(seq_along(x)) - 1))) max(seat_ids) [1] 874

## Finding an empty seat

Get the missing value, which isn’t the minimum or the maximum in the list.

setdiff(seq(min(seat_ids), max(seat_ids)), seat_ids) [1] 594

# Day 6 - Custom customs

## Questions with any ‘yes’

Count the number of unique letters in each group, where a ‘group’ is
series of strings separated from others by a blank line. This is a
*union* set operation.

input <- readLines('input06.txt') group <- cumsum(!nchar(input)) library(dplyr) responses <- data.frame(group = group[nchar(input) > 0], questions = input[nchar(input) > 0]) union <- aggregate(questions ~ group, responses, function(x) length(unique(unlist(strsplit(x, ''))))) sum(union$questions) [1] 6551

## Questions with all ‘yes’

Similar, but now an *intersection* set operation.

intersection <- aggregate(questions ~ group, responses, function(x) length(Reduce(intersect, strsplit(x, '')))) sum(intersection$questions) [1] 3358

The solution to the first part could have used `Reduce(union, ...)`

,
which would achieve the same result as `unique(unlist(...))`

.

Both of these could be made a bit more readable using `dplyr`

or
`data.table`

instead. In particular, the base function `aggregate`

doesn’t like list-columns as inputs, so the `strsplit`

can’t be done
before the aggregation. This is not a problem with `dplyr::summarise`

or
`data.table`

:

library(dplyr) responses %>% mutate(questions = strsplit(questions, '')) %>% group_by(group) %>% summarise(count = Reduce(intersect, questions) %>% length) %>% pull(count) %>% sum [1] 3358 library(data.table) setDT(responses)[, questions := strsplit(questions, '')] responses[, .(count = length(Reduce(intersect, questions))), by = group][, sum(count)] [1] 3358

I will update this post as I complete future puzzles.

**leave a comment**for the author, please follow the link and comment on their blog:

**R on Tea & Stats**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.