My R take on Advent of Code – Day 2

[This article was first published on r-tastic, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This is my second blog post from the series of My R take on Advent of Code. If you’d like to know more about Advent of Code, check out the first post from the series or simply go to their website. Below you’ll find the challnge from Day 2 and the solution that worked for me. As always, feel free to leave comments if you have different ideas on how this could have been solved!

Day 2 Puzzle

(…) you scan the likely candidate boxes again, counting the number that have an ID containing exactly two of any letter and then separately counting those with exactly three of any letter. You can multiply those two counts together to get a rudimentary checksum and compare it to what your device predicts. For example, if you see the following box IDs:

abcdef contains no letters that appear exactly two or three times.
bababc contains two a and three b, so it counts for both.
abbcde contains two b, but no letter appears exactly three times.
abcccd contains three c, but no letter appears exactly two times.
aabcdd contains two a and two d, but it only counts once.
abcdee contains two e.
ababab contains three a and three b, but it only counts once.

Of these box IDs, four of them contain a letter which appears exactly twice, and three of them contain a letter which appears exactly three times. Multiplying these together produces a checksum of 4 * 3 = 12. What is the checksum for your list of box IDs?

So what is it all about? As complicated as it may sound, essentially we need to:

  • understand which string contains letters that appear exactly 2 times
  • understand which string contains letters that appear exactly 3 times
  • count the number of each type of string
  • multiply them together

Doesn’t sound so bad anymore, ey? This is how we can go about it:

First load your key packages…

library(dplyr)
library(stringr)
library(tibble)
library(purrr)

… and have a look at what the raw input looks like.

# check raw input
glimpse(input)
##  chr "xrecqmdonskvzupalfkwhjctdb\nxrlgqmavnskvzupalfiwhjctdb\nxregqmyonskvzupalfiwhjpmdj\nareyqmyonskvzupalfiwhjcidb\"| __truncated__

Right, Advent of Code will never give you nice and clean data to work with, that’s for sure. But it doesn’t look like things are too bad this time – let’s just split it by the new line and keep it as a vector for now. Does it look reaosnably good?

# clean it
clean_input =  strsplit(input, '\n') %>% unlist()   # splt by NewLine
glimpse(clean_input)
##  chr [1:250] "xrecqmdonskvzupalfkwhjctdb" "xrlgqmavnskvzupalfiwhjctdb" ...

Much better! Now, let’s put it all in a data frame for now, we’ll need it very soon.

# put it in the data.frame
df2 <- tibble(input = str_trim(clean_input))
head(df2)
## # A tibble: 6 x 1
##   input                     
##   <chr>                     
## 1 xrecqmdonskvzupalfkwhjctdb
## 2 xrlgqmavnskvzupalfiwhjctdb
## 3 xregqmyonskvzupalfiwhjpmdj
## 4 areyqmyonskvzupalfiwhjcidb
## 5 xregqpyonskvzuaalfiwhjctdy
## 6 xwegumyonskvzuphlfiwhjctdb

Now, the way I approached this was to split each word into letters and then count how many times they occured. Then, for identifying words with 2 occurences, I filtered only those that occur twice and if the final table has any rows, then this counts as yes. Take the first example:

strsplit(input, '\n') %>% unlist() %>% .[[1]] # get the first example
## [1] "xrecqmdonskvzupalfkwhjctdb"

Let’s split it by the letter, put it in a tibble and count each letter occurances:

strsplit(input, '\n') %>% unlist() %>% .[[1]] %>% # get the first example 
  strsplit('') %>% # split letters
  unlist() %>% # get a vector
  as_tibble() %>% # trasform vector to tibble
  rename_(letters = names(.)[1]) %>% # name the column: letters 
  count(letters)
## # A tibble: 23 x 2
##    letters     n
##    <chr>   <int>
##  1 a           1
##  2 b           1
##  3 c           2
##  4 d           2
##  5 e           1
##  6 f           1
##  7 h           1
##  8 j           1
##  9 k           2
## 10 l           1
## # ... with 13 more rows

Now, do we have any double occurances there?

# test: counting double letter occurances 
strsplit(input, '\n') %>% unlist() %>% .[[1]] %>% # get the first example 
  strsplit('') %>% # split letters
  unlist() %>% # get a vector
  as_tibble() %>% # trasform vector to tibble
  rename_(letters = names(.)[1]) %>% # name the column: letters 
  count(letters) %>% # count letter occurances
  filter(n == 2) %>% # get only those with double occurances
  nrow() # how many are there?
## [1] 3

Definitely yes. Let’s repeat the process for tripple occurances:

# test: counting triple letter occurances 
strsplit(input, '\n') %>% unlist() %>% .[[1]] %>% # get the first example
  strsplit('') %>% # split letters
  unlist() %>% 
  as_tibble() %>% # trasforming vector to tibble
  rename_(letters = names(.)[1]) %>% 
  count(letters) %>% 
  filter(n == 3) %>% 
  nrow()
## [1] 0

Not much luck with those in this case. To make our life easier, let’s wrap both calculations in functions…

### wrap-up in functions
# count double occurances 
count2 <- function(x) {
  result2 <-  as.character(x) %>% 
    strsplit('') %>% # split by letters
    unlist() %>% 
    as_tibble() %>% # trasforming vector to tibble
    rename_(letters = names(.)[1]) %>% 
    count(letters) %>% # count letter occurances
    filter(n == 2) %>% 
    nrow()
  return(result2)
}


# count triple occurances 
count3 <- function(x) {
  result2 <-  as.character(x) %>% 
    strsplit('') %>% 
    unlist() %>% 
    as_tibble() %>% # trasforming vector to tibble
    rename_(letters = names(.)[1]) %>% 
    count(letters) %>% 
    filter(n == 3) %>% 
    nrow()
  return(result2)
}

…and apply them to the whole dataset:

### apply functions to input
occurs2 <- map_int(df2$input, count2)
occurs3 <- map_int(df2$input, count3)
str(occurs2)
##  int [1:250] 3 3 3 3 2 3 3 2 2 2 ...

Now, all we need to do is check how many positive elements we have in each vector and multiple their lengths by each other:

#solution
length(occurs2[occurs2 != 0]) * length(occurs3[occurs3 != 0])
## [1] 5976

Voila!

To leave a comment for the author, please follow the link and comment on their blog: r-tastic.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)