survivoR v1.0 is now on CRAN

[This article was first published on R Archives - Dan Oehm | Gradient Descending, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I’m happy to announce that survivoR v1.0 is now on CRAN. The package now contains all the features intended for the first major release. A big thank you to Carly Levitz for helping collate and test the data.

This post details the major updates since v0.9.12. For a complete list of tables and features of the package please visit the Github page.

To jump right into it you can install the package with

install.packages("survivoR")

Or from Git with

devtools::install_github("doehm/survivoR")

If you find an issues please raise them on Github and I’ll correct them asap. For updates feel free to follow myself and Carly on Twitter.

News

This release features new datasets, additional fields on existing tables, and the removal of unused or redundant features.

New tables:

  • advantage_details – Details of each advantage found and used across all seasons
  • advantage_movement – Details the movement of each advantage and when each advantage is played including hidden immunity idols
  • boot_mapping – A mapping table for the stage of the game referenced by the number of boots there have been

New features:

  • vote_history
    • tribe – The name of the tribe that attended Tribal Council.
    • vote_event – To identify other events that can occur at Tribal Council e.g. castaway played the Shot-in-the-Dark.
    • split_vote – If there was a split vote orchestrated to flush an idol this identifies who the votes were split across and who was involved with the strategy.
    • tie – A logical field to identify if the vote resulted in a tie.
  • challenge_results
    • order – The boot order references how many boots there have been in the game so far. This is to map to the boot_mapping table
  • viewers
    • imdb_rating – The IMDb rating for the episode. Given these are user ratings they may change over time. With each new release, the ratings will be updated however only minor changes are expected for the most recent season.

Removed features:

  • castaways
    • swapped_tribe
    • swapped_tribe_2
    • merged_tribe
    • total_votes_received
    • immunity_idols_won

Updates:

  • confessionals – Double episodes have been collapsed to ensure alignment with episodes on all other tables. This will impact mean confessionals per episode calculations but has a more consistent and convenient structure. Recap episodes are also accounted for.

Advantages

All advantages and hidden immunity idols found across all seasons are captured in these two tables. The tables map to each other by advantage_id and detail the life of each advantage in tidy format.

Advantage Details

This dataset lists the hidden idols and advantages in the game for all seasons. It details where it was found, if there was a clue to the advantage, location, and other advantage conditions. This maps to the advantage_movement table.

> advantage_details |> 
+     filter(season == 41)
# A tibble: 9 x 9
  version version_season season_name  season advantage_id advantage_type  clue_details  location_found conditions            
  <chr>   <chr>          <chr>         <dbl> <chr>        <chr>           <chr>         <chr>          <chr>                 
1 US      US41           Survivor: 41     41 USEV4101     Extra vote      No clue exis~ Shipwheel Isl~ NA                    
2 US      US41           Survivor: 41     41 USEV4102     Extra vote      No clue exis~ Shipwheel Isl~ NA                    
3 US      US41           Survivor: 41     41 USEV4103     Extra vote      No clue exis~ Shipwheel Isl~ NA                    
4 US      US41           Survivor: 41     41 USHI4101     Hidden immunit~ Found withou~ Found around ~ Beware advantage: all~
5 US      US41           Survivor: 41     41 USHI4102     Hidden immunit~ Found withou~ Found around ~ Beware advantage: all~
6 US      US41           Survivor: 41     41 USHI4103     Hidden immunit~ Found withou~ Found around ~ Beware advantage: all~
7 US      US41           Survivor: 41     41 USHI4104     Hidden immunit~ Found withou~ Found around ~ Beware advantage: all~
8 US      US41           Survivor: 41     41 USKP4101     Knowledge is p~ No clue exis~ Found around ~ NA                    
9 US      US41           Survivor: 41     41 USVS4101     Steal a vote    No clue exis~ Shipwheel Isl~ NA  

Advantage Movement

The advantage_movement table tracks who found the advantage, who they may have handed it to, and who they played it for. Each step is considered an event. The sequence_id tracks the logical step of the advantage. For example, in season 41, JD found an extra vote advantage. JD gave it to Shan in good faith who then voted him out keeping the extra vote. Shan gave it to Ricard in good faith who eventually gave it back before Shan played it for Naseer. That movement is recorded in this table.

Who the advatnge was eventually played for, if it was successful or not needed is included in this table. Or in the unfortunate situations when someone is blindsided and voted out with the advantage, that is recorded here.

> advantage_movement |> 
+     filter(advantage_id == "USEV4102")
# A tibble: 5 x 15
  version version_season season_name  season castaway castaway_id advantage_id sequence_id   day episode event    played_for played_for_id success votes_nullified
  <chr>   <chr>          <chr>         <dbl> <chr>    <chr>       <chr>              <dbl> <dbl>   <dbl> <chr>    <chr>      <chr>         <chr>             <dbl>
1 US      US41           Survivor: 41     41 JD       US0603      USEV4102               1     2       1 Found    NA         NA            NA                   NA
2 US      US41           Survivor: 41     41 Shan     US0606      USEV4102               2     9       4 Received NA         NA            NA                   NA
3 US      US41           Survivor: 41     41 Ricard   US0596      USEV4102               3     9       4 Received NA         NA            NA                   NA
4 US      US41           Survivor: 41     41 Shan     US0606      USEV4102               4    11       5 Received NA         NA            NA                   NA
5 US      US41           Survivor: 41     41 Shan     US0606      USEV4102               5    17       9 Played   Naseer     US0600        Yes                  NA

Boot Mapping

The boot_mapping table is to easily filter to the set of castaways that are still in the game after a specified number of boots. How this differs from the tribe mapping is that rather than being focused on an episode, it is focused on the boot which is often more useful. The number of boots and who is left in the game is often the better indicator of the stage of the game than the episode or day. When someone quits the game or is medically evacuated it is considered a boot. This table tracks multiple boots per episode.

In the case of double tribal councils there is an order in which castaways have their torch snuffed. This is also capture even though it means there is a set of players still remaining for literally minutes before the next leaves the game.

If you needed to determine who is left in the game of season 41 after 12 boots (12 people have either been voted off or left the game) you can use the following code.

> boot_mapping |> 
+     filter(
+         season == 41,
+         order == 12
+     )
# A tibble: 6 x 11
  version version_season season_name  season episode order castaway castaway_id tribe    tribe_status in_the_game
  <chr>   <chr>          <chr>         <dbl>   <dbl> <dbl> <chr>    <chr>       <chr>    <chr>        <lgl>      
1 US      US41           Survivor: 41     41      12    12 Heather  US0593      Via Kana Merged       TRUE       
2 US      US41           Survivor: 41     41      12    12 Erika    US0594      Via Kana Merged       TRUE       
3 US      US41           Survivor: 41     41      12    12 Ricard   US0596      Via Kana Merged       TRUE       
4 US      US41           Survivor: 41     41      12    12 Xander   US0597      Via Kana Merged       TRUE       
5 US      US41           Survivor: 41     41      12    12 Danny    US0599      Via Kana Merged       TRUE       
6 US      US41           Survivor: 41     41      12    12 Deshawn  US0601      Via Kana Merged       TRUE     

A an example, the boot_mapping table can be used to calculate how many people and who participated in certain challenges once mapped to challenge_results.

df_challenges <- challenge_results |> 
  unnest(winners) |> 
  filter(
    season == 41,
    order == 4,
    outcome_status == "Winner"
  ) |> 
  count(season, episode, order, challenge_type, name = "n_winners")

boot_mapping |> 
  filter(
    season == 41,
    order == 4
  ) |>
  count(season, episode, order, name = "n_challengers") |> 
  left_join(df_challenges, by = c("season", "episode", "order"))

This table comes in hand for many types of analysis. Please see the documentation for detailed descriptions of the fields.

ggplot2 scales

This is more of a reminder the package also includes ggplot fill and colour scales based on the season logo and tribe colours. Season 42 season logo and tribe colour palletes have been added. To use the colours from a particular season simply use scale_*_survivor(<season number>) or scale_*_tribes(<season number>)

library(survivoR)
library(tidyverse)

df_results <- castaways |> 
  mutate(
    result = case_when(
      str_detect(result, "Sole") ~ "Sole Survivor",
      str_detect(result, "unner") ~ "Finalist",
      str_detect(jury_status, "jury") ~ "Jury",
      TRUE ~ "Other"
    ),
    result = factor(result, levels = c("Sole Survivor", "Finalist", "Jury", "Other"))
  ) |> 
  distinct(version_season, castaway_id, result)

vote_history |> 
  filter(!is.na(vote_id)) |> 
  left_join(df_results, by = c("version_season", "vote_id" = "castaway_id")) |> 
  count(order, result) |> 
  ggplot(aes(order, n, fill = result)) +
  geom_col() +
  scale_x_continuous(breaks = 1:20, labels = 1:20) +
  labs(
    title = "Total number of votes across 42 seasons",
    subtitle = "Distribution of votes by boot order and result",
    x = "Boot order",
    y = "Number of votes received",
    fill = "Result"
  ) +
  scale_fill_survivor(42) +
  theme_minimal() 
confessionals |> 
  left_join(df_results, by = c("version_season", "castaway_id")) |> 
  group_by(episode, result) |> 
  summarise(n = sum(confessional_count)) |> 
  ggplot(aes(episode, n, fill = result)) +
  geom_col() +
  scale_x_continuous(breaks = 1:16, labels = 1:16) +
  labs(
    title = "Total number of confessionals across 42 seasons",
    subtitle = "Distribution of confessionals by episode and result",
    x = "Episode",
    y = "Number of confessionals",
    fill = "Result"
  ) +
  scale_fill_tribes(16, reverse = TRUE) +
  theme_minimal() 

The post survivoR v1.0 is now on CRAN appeared first on Dan Oehm | Gradient Descending.

To leave a comment for the author, please follow the link and comment on their blog: R Archives - Dan Oehm | Gradient Descending.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)