Video Games and Sliced

[This article was first published on Ronan's #TidyTuesday blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Setup

Loading the R libraries and data set.

# Loading libraries
library(tidyverse)

# Reading in the raw data from GitHub (I would use "tt_load", but I hit an API
# rate limt)
games <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-03-16/games.csv')

Plotting Peak vs. Average number of players using 200 observations

For this plot, the top 200 observations for average number of players at the same time are selected using slice_max(order_by = avg, n = 200). The peak and average number of players for these observations are plotted on a scatter plot. The colour of the points indicates the game used for each observation. Models are fit to illustrate trends in the data; these trends follow the featured three games. The game “Cyberpunk 2077” was filtered out before creating this plot, as it occurred in only one of the 200 observations.

games %>%
  select(gamename, avg, peak) %>%
  # Filtering out Cyberpunk 2077 as it has only a single observation
  filter(gamename != "Cyberpunk 2077") %>%
  slice_max(order_by = avg, n = 200) %>%
  ggplot(aes(x = avg, y = peak, colour = gamename)) +
  geom_point() +
  geom_smooth(formula = y ~ x) +
  theme_bw() +
  theme(legend.position = "bottom") +
  labs(title = "Peak vs. Average number of players online simultaneously",
       subtitle = "Top 200 observations for Average used",
       y = "Highest number of players at the same time",
       x = "Average number of players at the same time",
       colour = "Game")

Combining “year” and “month” into a new variable

Combining the year and month variables makes it easier to track when each observation was recorded. Creating this new year_month variable using the as.date() function from {lubridate} ensures that it will be interpreted as a date.

# Creating a "year_month" variable with the year and month of each observation
# using the lubridate "as_date()" function, by pasting together...
games$year_month <- lubridate::as_date(paste(
  # ...the "year" variable...
  games$year,
  # ...the number of each month, obtained by matching the month names to the
  # "month.name" built-in constant...
  match(games$month, month.name),
  # ...the number "1", as a dummy "day" value...
  1,
  # ...separated by a "-".
  sep = "-"))

# Printing the start of the "games" object with the new variable...
games
# A tibble: 83,631 x 8
   gamename  year month    avg    gain   peak avg_peak_perc year_month
   <chr>    <dbl> <chr>  <dbl>   <dbl>  <dbl> <chr>         <date>
 1 Counter…  2021 Febr… 7.41e5  -2196. 1.12e6 65.9567%      2021-02-01
 2 Dota 2    2021 Febr… 4.05e5 -27840. 6.52e5 62.1275%      2021-02-01
 3 PLAYERU…  2021 Febr… 1.99e5  -2290. 4.47e5 44.4707%      2021-02-01
 4 Apex Le…  2021 Febr… 1.21e5  49216. 1.97e5 61.4752%      2021-02-01
 5 Rust      2021 Febr… 1.18e5 -24375. 2.24e5 52.4988%      2021-02-01
 6 Team Fo…  2021 Febr… 1.01e5  18083. 1.34e5 75.7603%      2021-02-01
 7 Grand T…  2021 Febr… 9.06e4 -10603. 1.46e5 61.9017%      2021-02-01
 8 Tom Cla…  2021 Febr… 7.24e4  -5335. 1.13e5 63.8645%      2021-02-01
 9 Rocket …  2021 Febr… 5.37e4  -5726. 1.03e5 51.9419%      2021-02-01
10 Path of…  2021 Febr… 4.69e4   -766. 9.05e4 51.8229%      2021-02-01
# … with 83,621 more rows

This plot uses the year_month variable on the x-axis and the gain variable on the y-axis to illustrate month-to-month gains and losses in average players. The three games used share the majority of the highest avg (average number of simultaenous players) values in the data set. This graph is faceted for each game. To put these gains and losses into perspective, dashed lines are added at plus and minus 100,000 players in each facet.

games %>%
  # Selecting the variables
  select(gamename, year_month, gain) %>%
  # Filtering the data set for three of the most popular games
  filter(gamename == "Dota 2" |
           gamename == "PLAYERUNKNOWN'S BATTLEGROUNDS" |
           gamename == "Counter-Strike: Global Offensive") %>%
  ggplot(aes(year_month, gain, fill = gamename)) +
  geom_col() +
  theme_classic() +
  theme(legend.position = "none") +
  # Adding dashed lines to put facets into perspective
  geom_hline(yintercept = 100000, linetype = "dashed") +
  geom_hline(yintercept = -100000, linetype = "dashed") +
  # Faceting the plot for each game
  facet_wrap(~gamename, scales = "free") +
  labs(
    title = "Gains/Losses in average number of players online for three games",
    subtitle = "Dashed lines added at +/-100,000 players for each game",
    x = "Time",
    y = "Gains/Losses in average number of players"
  )

Discussion

The two plots in this post illustrate peak number of players online, average number of players online, and changes in that average for three games. Of these games, PLAYERUNKNOWN’S BATTLEGROUNDS (PUBG) has the highest values for peak and average players by far. However, these high points were not sustained, with dramatic losses in average number of players per month in 2018. By contrast, Dota 2 has maintained a relatively steady player base, without dramatic gains or losses. Counter-Strike: Global Offensive’s spike in popularity around April 2020 coincides with the introduction of lockdown measures.

To leave a comment for the author, please follow the link and comment on their blog: Ronan's #TidyTuesday blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)