Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Advent of Code is a series of programming puzzles you can tackle to hone your coding skills each day in the run-up to Christmas.

This year I am attempting it using R, which can make some challenges easier or harder depending on whether they are more ‘computer sciencey’ or more ‘data sciencey’. Generally it makes parsing datasets easier but low-level string manipulation more fiddly.

Here are my solutions so far. Where possible, I’ve tried to strike a balance between efficiency and readability, and to avoid using the packages I might usually use (e.g. dplyr) if I think it makes the puzzle too easy.

The input data are different for each participant, so your numerical results may differ from mine.

# Day 1 – Report repair

## Two numbers

Find the two entries that sum to 2020, then multiply those two numbers together.

This can be a one-liner:

input <- as.integer(readLines('input01.txt'))
prod(input[(2020 - input) %in% input])

[1] 468051


## Three numbers

Find the three entries that sum to 2020, then multiply them together.

It might be tempting to go for a naïve solution like this:

prod(combn(input, 3)[, combn(input, 3, sum) == 2020])

[1] 272611658


It gives the right answer but involves a fair amount of unnecessary computation. It takes more than a second to run. If we assume all the inputs are non-negative, we can take advantage of this to reduce the number of operations.

. <- expand.grid(input, input[(2020 - input) > min(input)])
. <- transform(., Var3 = 2020 - Var1 - Var2)
. <- subset(., Var3 > min(input))

[1] 190


## Field validation

Thanks to the way we imported the data, this is quite straightforward. The rules are:

• byr (Birth Year) - four digits; at least 1920 and at most 2002.
• iyr (Issue Year) - four digits; at least 2010 and at most 2020.
• eyr (Expiration Year) - four digits; at least 2020 and at most 2030.
• hgt (Height) - a number followed by either cm or in:
• If cm, the number must be at least 150 and at most 193.
• If in, the number must be at least 59 and at most 76.
• hcl (Hair Color) - a # followed by exactly six characters 0-9 or a-f.
• ecl (Eye Color) - exactly one of: amb blu brn gry grn hzl oth.
• pid (Passport ID) - a nine-digit number, including leading zeroes.
• cid (Country ID) - ignored, missing or not.

The data are all different types (integer, double and categorical) so the first step will be to spread the table to a wider format, with one row per passport, and one column for each field.

Here is a dplyr + tidyr solution.

library(dplyr)
library(tidyr)
passports_wide <- passports %>%
pivot_wider(names_from = key, values_from = value) %>%
mutate(byr = as.integer(byr),
iyr = as.integer(iyr),
eyr = as.integer(eyr),
hgt_value = as.numeric(gsub('cm|in$', '', hgt)), hgt_unit = gsub('\\d*', '', hgt)) head(passports_wide) # A tibble: 6 x 11 id iyr cid pid eyr hcl ecl byr hgt hgt_value hgt_unit <int> <int> <chr> <chr> <int> <chr> <chr> <int> <chr> <dbl> <chr> 1 0 1928 150 4761132~ 2039 a5ac0f #25f8~ 2027 190 190 "" 2 1 2013 169 9200769~ 2026 #fffffd hzl 1929 168cm 168 "cm" 3 2 2011 <NA> 3284128~ 2023 #6b5442 brn 1948 156cm 156 "cm" 4 3 2019 279 6749079~ 2020 #602927 amb 1950 189cm 189 "cm" 5 4 2015 <NA> 4736300~ 2022 #341e13 hzl 1976 178cm 178 "cm" 6 5 2020 <NA> 6281139~ 2023 #866857 blu 1984 163cm 163 "cm"  From here, we can filter out the invalid entries, using filter or subset. passports_wide %>% filter(byr >= 1920, byr <= 2002, iyr >= 2010, iyr <= 2020, eyr >= 2020, eyr <= 2030, hgt_value >= 150 & hgt_value <= 193 & hgt_unit == 'cm' | hgt_value >= 59 & hgt_value <= 76 & hgt_unit == 'in', grepl('^#[0-9a-f]{6}$', hcl),
ecl %in% c('amb', 'blu', 'brn', 'gry', 'grn', 'hzl', 'oth'),

[1] 6551


## Questions with all ‘yes’

Similar, but now an intersection set operation.

intersection <- aggregate(questions ~ group, responses,
function(x) length(Reduce(intersect, strsplit(x, ''))))
sum(intersection\$questions)

[1] 3358


The solution to the first part could have used Reduce(union, ...), which would achieve the same result as unique(unlist(...)).

Both of these could be made a bit more readable using dplyr or data.table instead. In particular, the base function aggregate doesn’t like list-columns as inputs, so the strsplit can’t be done before the aggregation. This is not a problem with dplyr::summarise or data.table:

library(dplyr)
responses %>%
mutate(questions = strsplit(questions, '')) %>%
group_by(group) %>%
summarise(count = Reduce(intersect, questions) %>% length) %>%
pull(count) %>% sum

[1] 3358

library(data.table)
setDT(responses)[, questions := strsplit(questions, '')]
responses[, .(count = length(Reduce(intersect, questions))),
by = group][, sum(count)]

[1] 3358


I will update this post as I complete future puzzles.