Season 48 is now in 📦{survivoR} + new datasets and data updates

[This article was first published on R Archives - Dan Oehm | Gradient Descending, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Survivor 48 has wrapped up, and the data has been added to the package and is now available on CRAN. There are a few cool updates to the data, which I’ll go over:

  • Adding complete US48 data
  • Huge update to castaway_scores
  • New boot_order data set
  • Update to castaways such that it no longer includes people booted more than once
  • season_name has been deprecated from all tables other than season_summary
  • Season 50 cast added

If you want to skip that and just get the Survivor Season 48 data, head to Github for more details and how to install.

castaway_scores

I’ve developed a metric to appropriately score and rank castaways’ performance in a season and across multiple seasons. Ultimately, I want a measure where achieving 100% across all factors e.g. challenges, voting performance, strategy, final placing, and jury votes, is considered a ‘perfect game’.

I’ve realised it’s really, really hard to quantify the perfect game and best player, but I’ve come up with a few measures and an overarching methodology. The castaway_scores table is now quite extensive and may not make a lot of sense straight up. Worthwhile reading the doco in the package. You can read up on the survivor score methodology here.

Here’s a look at the scores for the final 4 of S48. You can browse the score cards for all 875 players across 48 seasons at survivorstatsdb.com.

boot_order and castaways

This is a change I’ve been wanting to make for a while. The issue with the castaways tables is that for seasons like Redemption Island where players can come back, they are listed in the table twice. It was pretty annoying and not very clean.

I’ve now reduced it so the castaways table only includes a castaway once and for their final result. The new boot_order retains the order of each boot when that is needed. This shouldn’t disrupt anything but will make it easier to use.

season_name

I’ve removed the season_name field for all data sets other than season_summary. That field wasn’t used very often and there was a lot of redundancy keeping it on all data sets. Instead, I’ll just keep it on season_summary and you can join it on when needed. Easy.

Season 50

Wow. Season 50 is almost here! I have included the list of castaways on the castaways table that should make it easy to do some analysis pre-kick-off.

The post Season 48 is now in 📦{survivoR} + new datasets and data updates appeared first on Dan Oehm | Gradient Descending.

To leave a comment for the author, please follow the link and comment on their blog: R Archives - Dan Oehm | Gradient Descending.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)