[This article was first published on Ronan's #TidyTuesday blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

## Setup

Loading the `R` libraries and data set.

```# Loading libraries
library(tidyverse)

# Reading in the raw data from GitHub (I would use "tt_load", but I hit an API
# rate limt)

## Plotting Peak vs. Average number of players using 200 observations

For this plot, the top 200 observations for average number of players at the same time are selected using `slice_max(order_by = avg, n = 200)`. The peak and average number of players for these observations are plotted on a scatter plot. The colour of the points indicates the game used for each observation. Models are fit to illustrate trends in the data; these trends follow the featured three games. The game “Cyberpunk 2077” was filtered out before creating this plot, as it occurred in only one of the 200 observations.

```games %>%
select(gamename, avg, peak) %>%
# Filtering out Cyberpunk 2077 as it has only a single observation
filter(gamename != "Cyberpunk 2077") %>%
slice_max(order_by = avg, n = 200) %>%
ggplot(aes(x = avg, y = peak, colour = gamename)) +
geom_point() +
geom_smooth(formula = y ~ x) +
theme_bw() +
theme(legend.position = "bottom") +
labs(title = "Peak vs. Average number of players online simultaneously",
subtitle = "Top 200 observations for Average used",
y = "Highest number of players at the same time",
x = "Average number of players at the same time",
colour = "Game")```

## Combining “year” and “month” into a new variable

Combining the `year` and `month` variables makes it easier to track when each observation was recorded. Creating this new `year_month` variable using the `as.date()` function from `{lubridate}` ensures that it will be interpreted as a date.

```# Creating a "year_month" variable with the year and month of each observation
# using the lubridate "as_date()" function, by pasting together...
games\$year_month <- lubridate::as_date(paste(
# ...the "year" variable...
games\$year,
# ...the number of each month, obtained by matching the month names to the
# "month.name" built-in constant...
match(games\$month, month.name),
# ...the number "1", as a dummy "day" value...
1,
# ...separated by a "-".
sep = "-"))

# Printing the start of the "games" object with the new variable...
games
# A tibble: 83,631 x 8
gamename  year month    avg    gain   peak avg_peak_perc year_month
<chr>    <dbl> <chr>  <dbl>   <dbl>  <dbl> <chr>         <date>
1 Counter…  2021 Febr… 7.41e5  -2196. 1.12e6 65.9567%      2021-02-01
2 Dota 2    2021 Febr… 4.05e5 -27840. 6.52e5 62.1275%      2021-02-01
3 PLAYERU…  2021 Febr… 1.99e5  -2290. 4.47e5 44.4707%      2021-02-01
4 Apex Le…  2021 Febr… 1.21e5  49216. 1.97e5 61.4752%      2021-02-01
5 Rust      2021 Febr… 1.18e5 -24375. 2.24e5 52.4988%      2021-02-01
6 Team Fo…  2021 Febr… 1.01e5  18083. 1.34e5 75.7603%      2021-02-01
7 Grand T…  2021 Febr… 9.06e4 -10603. 1.46e5 61.9017%      2021-02-01
8 Tom Cla…  2021 Febr… 7.24e4  -5335. 1.13e5 63.8645%      2021-02-01
9 Rocket …  2021 Febr… 5.37e4  -5726. 1.03e5 51.9419%      2021-02-01
10 Path of…  2021 Febr… 4.69e4   -766. 9.05e4 51.8229%      2021-02-01
# … with 83,621 more rows```

This plot uses the `year_month` variable on the x-axis and the `gain` variable on the y-axis to illustrate month-to-month gains and losses in average players. The three games used share the majority of the highest `avg` (average number of simultaenous players) values in the data set. This graph is faceted for each game. To put these gains and losses into perspective, dashed lines are added at plus and minus 100,000 players in each facet.

```games %>%
# Selecting the variables
select(gamename, year_month, gain) %>%
# Filtering the data set for three of the most popular games
filter(gamename == "Dota 2" |
gamename == "PLAYERUNKNOWN'S BATTLEGROUNDS" |
gamename == "Counter-Strike: Global Offensive") %>%
ggplot(aes(year_month, gain, fill = gamename)) +
geom_col() +
theme_classic() +
theme(legend.position = "none") +
# Adding dashed lines to put facets into perspective
geom_hline(yintercept = 100000, linetype = "dashed") +
geom_hline(yintercept = -100000, linetype = "dashed") +
# Faceting the plot for each game
facet_wrap(~gamename, scales = "free") +
labs(
title = "Gains/Losses in average number of players online for three games",
subtitle = "Dashed lines added at +/-100,000 players for each game",
x = "Time",
y = "Gains/Losses in average number of players"
)```

## Discussion

The two plots in this post illustrate peak number of players online, average number of players online, and changes in that average for three games. Of these games, PLAYERUNKNOWN’S BATTLEGROUNDS (PUBG) has the highest values for peak and average players by far. However, these high points were not sustained, with dramatic losses in average number of players per month in 2018. By contrast, Dota 2 has maintained a relatively steady player base, without dramatic gains or losses. Counter-Strike: Global Offensive’s spike in popularity around April 2020 coincides with the introduction of lockdown measures.