Alone R package: Datasets from the survival TV series

[This article was first published on R Archives - Dan Oehm | Gradient Descending, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I have been watching the survival TV series ‘Alone,’ where 10 survivalists are dropped in an extremely remote area and must fend for themselves. I am super impressed by their skills, endurance, and mental fortitude. To last 100 days in the Arctic winter living off the land is truly impressive.

True to form, I’ve collected the data and I am sharing it here in the {alone} R package.

It is a collection of datasets about the TV series in a tidy format. Included in the package are 4 datasets

  • survivalists
  • loadouts
  • episodes
  • seasons

For non-Rstats users here is the link to the Google sheets doc.

Installation

Install from CRAN:

install.packages("alone")

Install from Github:

devtools::install_github("doehm/alone")

Datasets

survivalists

A data frame of survivalists across all 9 seasons detailing name and demographics, location and profession, result, days lasted, reasons for tapping out (detailed and categorised), and page URL.

Dataset features:

  • season: The season number
  • name: Name of the survivalist
  • age: Age of survivalist
  • gender: Gender
  • city: City
  • state: State
  • country: Country
  • result: Place the survivalist finished in the season
  • days_lasted: The number of days lasted in the game before tapping out or winning
  • medically_evacuated: Logical. If the survivalist was medically evacuated from the game
  • reason_tapped_out: The reason the survivalist tapped out of the game. NA means they were the winner. Reason being that technically if they won they never tapped out.
  • reason_category: A simplified category of the reason for tapping out
  • team: The team they were associated with (only for season 4)
  • day_linked_up: Day the team members linked up (only for season 4)
  • profession: Profession
  • url: URL of cast page on the history channel website. Prefix URL with https://www.history.com/shows/alone/cast

As an example, use this dataset to compare the number of days survived for both men and women.

library(tidyverse)

df <- expand_grid(
  days_lasted = 0:max(survivalists$days_lasted),
  gender = unique(survivalists$gender)
) |> 
  left_join(
    survivalists |> 
      count(days_lasted, gender),
    by = c("days_lasted", "gender")
  ) |> 
  left_join(
    survivalists |> 
      count(gender, name = "N"),
    by = "gender"
  ) |> 
  group_by(gender) |> 
  mutate(
    n = replace_na(n, 0),
    n_lasted = N-cumsum(n),
    p = n_lasted/N
  ) 

# Kaplan-Meier survival curves
# code is simplified and plot won't match below
df |> 
  ggplot(aes(days_lasted, p, colour = gender)) +
  geom_line() 

# boxplots
survivalists |> 
  ggplot(aes(days_lasted, fill = gender)) +
  geom_boxplot(alpha = 0.5) +
  geom_jitter(width = 0.2, pch = 1, size = 3) +
  theme_minimal()

While there is yet to be a female winner, there is some evidence to suggest that women, on average, women survive longer than men. Although, we should investigate this further since in the first season there are a lot of early taps and no women.

loadouts

The rules allow each survivalist to take 10 items with them. This dataset includes information on each survivalist’s loadout. It has detailed item descriptions and a simplified version for easier aggregation and analysis.

Dataset features:

  • version: Country code for the version of the show
  • season: The season number
  • name: Name of the survivalist
  • item_number: Item number
  • item_detailed: Detailed loadout item description
  • item: Loadout item. Simplified for aggregation
    library(forcats)
    
    loadouts |>
      count(item) |>
      mutate(item = fct_reorder(item, n, max)) |>
      ggplot(aes(item, n)) +
      geom_col() +
      geom_text(aes(item, n + 3, label = n), family = ft, size = 12, colour = txt) +
      coord_flip()
    

    episodes

    This dataset contains details of each episode including the title, number of viewers, beginning quote, and IMDb rating. New episodes are added at the end of future seasons.

    Dataset features:

    • version: Country code for the version of the show
    • season: The season number
    • episode_number_overall: Episode number across seasons
    • episode: Episode number
    • title: Episode title
    • air_date: Date the episode originally aired
    • viewers: Number of viewers in the US (millions)
    • quote: The beginning quote
    • author: Author of the beginning quote
    • imdb_rating: IMDb rating of the episode
    • n_ratings: Number of ratings given for the episode

    seasons

    The season summary dataset includes location, latitude and longitude, and other season-level information. It includes the date of drop-off where the information exists.

    Dataset features:

    • version: Country code for the version of the show
    • season: The season number
    • location: Location
    • country: Country
    • n_survivors: Number of survivalists in the season. In season 4 there were 7 teams of 2.
    • lat: Latitude
    • lon: Longitude
    • date_drop_off: The date the survivalists were dropped off

    References

    If there is any data you would like to include please get in touch.

    1. History: https://www.history.com/shows/alone/cast
    2. Wikipedia: https://en.wikipedia.org/wiki/Alone_(TV_series)
    3. Wikipedia (episodes): https://en.wikipedia.org/wiki/List_of_Alone_episodes#Season_1_(2015)_-_Vancouver_Island

    The post Alone R package: Datasets from the survival TV series appeared first on Dan Oehm | Gradient Descending.

    To leave a comment for the author, please follow the link and comment on their blog: R Archives - Dan Oehm | Gradient Descending.

    R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
    Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

    Never miss an update!
    Subscribe to R-bloggers to receive
    e-mails with the latest R posts.
    (You will not see this message again.)

    Click here to close (This popup will not appear again)