Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Volleyball Analytics
Volleyball Analytics with R: A Practical, End-to-End Playbook
Build a full volleyball analytics workflow in R: data collection, cleaning, scouting reports, skill KPIs, rotation/lineup analysis, sideout & transition, serve/receive, visualization, dashboards, and predictive modeling.
< nav class="toc" aria-label="Table of contents">Table of Contents
- Why Volleyball Analytics (and Why R)
- Volleyball Data Model: Events, Rally, Set, Match
- Data Sources: Manual Logs, Video Tags, DataVolley-Style Exports
- R Project Setup & Reproducibility
- Import & Clean Volleyball Event Data
- Core Volleyball KPIs (Serve, Pass, Attack, Block, Dig)
- Sideout, Break Point, Transition & Rally Phase Analytics
- Rotation, Lineup, Setter Distribution & Matchups
- Serve & Serve-Receive Analytics (Zones, Heatmaps, Pressure)
- Attack Shot Charts, Zones, Tendencies & Scouting
- Modeling: Expected Sideout, Win Probability, Elo, Markov Chains
- Predictive Modeling with tidymodels
- Bayesian Volleyball Analytics in R
- Visualization: ggplot2 Templates for Volleyball
- Dashboards: Shiny Scouting Reports
- Automation: Reports to HTML/PDF + CI
- Best Practices + Common Pitfalls
- Recommended Book
- FAQ
Why Volleyball Analytics (and Why R)
Volleyball is a sequence of discrete events (serve, pass, set, attack, block, dig) organized into rallies and phases (sideout vs. transition). This structure makes it ideal for: event-based analytics, rotation analysis, scouting tendencies, expected efficiency modeling, and win probability.
R excels at this because of tidy data workflows (dplyr/tidyr), great visualization (ggplot2), modern modeling (tidymodels, brms), and easy reporting (Quarto/R Markdown). If you want a repeatable volleyball analytics pipeline for your club or team, R is a perfect fit.
Keywords you should care about
- Sideout % (SO%), Break Point % (BP%), Transition Efficiency
- Serve Pressure, Passing Rating, First Ball Sideout
- Attack Efficiency (kills – errors)/attempts, Kill Rate
- Rotation Efficiency, Lineup Net Rating, Setter Distribution
- Expected Sideout, Expected Point, Win Probability
- Scouting Tendencies, Shot Charts, Serve Target Heatmaps
Volleyball Data Model: Events, Rally, Set, Match
A practical volleyball dataset in R usually includes one row per contact or one row per event. The minimum columns for serious analytics:
match_id,set_no,rally_id,point_won_byteam,player,skill(serve, pass, set, attack, block, dig)evaluation(e.g., error, poor, ok, good, perfect, kill, continuation)start_zone,end_zone(serve zones, attack zones)rotation,server,receive_formationscore_home,score_away,home_team,away_team
R code: create a minimal event schema
library(tidyverse)
library(lubridate)
event_schema <- tibble::tibble(
match_id = character(),
datetime = ymd_hms(character()),
set_no = integer(),
rally_id = integer(),
home_team = character(),
away_team = character(),
team = character(), # team performing the action
opponent = character(), # opponent of team
player = character(),
jersey = integer(),
skill = factor(levels = c("serve","pass","set","attack","block","dig","freeball")),
evaluation = character(), # e.g., "error","ace","perfect","positive","negative","kill","blocked","dig"
start_zone = integer(), # 1..6 (or 1..9 depending system)
end_zone = integer(),
rotation = integer(), # 1..6
phase = factor(levels = c("sideout","transition")), # derived later
score_team = integer(), # score for team at time of event
score_opp = integer(),
point_won_by = character(), # which team won rally point
stringsAsFactors = FALSE
)
glimpse(event_schema)
You can extend this schema with positional labels (OH, MB, OPP, S, L),
contact order (1st/2nd/3rd), attack tempo, block touches, etc.
Data Sources: Manual Logs, Video Tags, DataVolley-Style Exports
Volleyball data typically arrives as: (1) manual spreadsheets, (2) video tagging exports, or (3) scouting software exports. Regardless of source, your R pipeline should:
- Import raw data
- Normalize team/player names
- Create rally keys (
match_id/set_no/rally_id) - Derive phases (sideout vs. transition)
- Compute KPIs and reporting tables
R code: robust import helpers
library(readr)
library(janitor)
read_events_csv <- function(path) {
readr::read_csv(path, show_col_types = FALSE) %>%
janitor::clean_names() %>%
mutate(
set_no = as.integer(set_no),
rally_id = as.integer(rally_id),
start_zone = as.integer(start_zone),
end_zone = as.integer(end_zone),
rotation = as.integer(rotation)
)
}
normalize_names <- function(df) {
df %>%
mutate(
team = str_squish(str_to_title(team)),
opponent = str_squish(str_to_title(opponent)),
player = str_squish(str_to_title(player)),
evaluation = str_squish(str_to_lower(evaluation)),
skill = factor(str_to_lower(skill),
levels = c("serve","pass","set","attack","block","dig","freeball"))
)
}
Tip for SEO + practice: call your columns and metrics consistently across posts: SO%, BP%, ACE%, ERR%, Kill%, Eff%, Pos%.
< section id="project-setup">R Project Setup & Reproducibility
Serious volleyball analytics needs reproducibility: same input data, same R version, same packages, same outputs. Use an R project + renv + Quarto.
R code: create a project scaffold
# Run once inside your project
install.packages(c("renv","quarto","tidyverse","lubridate","janitor","gt","patchwork","tidymodels"))
renv::init()
# Recommended folder structure
dir.create("data/raw", recursive = TRUE, showWarnings = FALSE)
dir.create("data/processed", recursive = TRUE, showWarnings = FALSE)
dir.create("R", showWarnings = FALSE)
dir.create("reports", showWarnings = FALSE)
dir.create("figures", showWarnings = FALSE)
R code: create a simple metric dictionary
metric_dictionary <- tribble( ~metric, ~definition, "SO%", "Sideout percentage: points won when receiving serve / total receive opportunities", "BP%", "Break point percentage: points won when serving / total serving opportunities", "Kill%", "Kills / attack attempts", "Eff%", "(Kills - Errors) / attempts", "Ace%", "Aces / total serves", "Err%", "Serve errors / total serves" ) metric_dictionary< section id="import-clean">
Import & Clean Volleyball Event Data
Most problems in volleyball analytics are data quality problems: inconsistent team names, missing rally keys, duplicated rows, weird evaluation labels, or mixed zone definitions.
R code: import + normalize + validate
events_raw <- read_events_csv("data/raw/events.csv")
events <- events_raw %>% normalize_names()
# Basic validation
stopifnot(all(c("match_id","set_no","rally_id","team","skill","evaluation") %in% names(events)))
# Remove obvious duplicates (same match/set/rally/team/player/skill)
events <- events %>%
distinct(match_id, set_no, rally_id, team, player, skill, evaluation, .keep_all = TRUE)
# Ensure opponent field exists
events <- events %>%
mutate(opponent = if_else(is.na(opponent) | opponent == "",
NA_character_, opponent))
# Quick data quality report
quality_report <- list(
n_rows = nrow(events),
n_matches = n_distinct(events$match_id),
missing_player = mean(is.na(events$player) | events$player == ""),
missing_zone = mean(is.na(events$start_zone)),
skill_counts = events %>% count(skill, sort = TRUE)
)
quality_report
R code: derive rally winner and rally phase
A common approach: identify which team served in the rally. If a team receives serve, that is a sideout opportunity. If a team is serving, that is a break point opportunity. You can derive phase per team within each rally.
derive_rally_context <- function(df) {
df %>%
group_by(match_id, set_no, rally_id) %>%
mutate(
serving_team = team[which(skill == "serve")[1]],
receiving_team = setdiff(unique(team), serving_team)[1],
phase = case_when(
team == receiving_team ~ "sideout",
team == serving_team ~ "transition",
TRUE ~ NA_character_
) %>% factor(levels = c("sideout","transition"))
) %>%
ungroup()
}
events <- derive_rally_context(events)
< section id="core-kpis">
Core Volleyball KPIs (Serve, Pass, Attack, Block, Dig)
Volleyball KPIs are best computed from event tables with clear skill and evaluation codes. Below is a practical KPI set that works for scouting and performance analysis.
R code: define standard evaluation mappings
# Customize to your coding system.
eval_map <- list(
serve = list(
ace = c("ace"),
error = c("error","serve_error"),
in_play = c("in_play","good","ok","positive","negative")
),
pass = list(
perfect = c("perfect","3"),
positive = c("positive","2","good"),
negative = c("negative","1","poor"),
error = c("error","0")
),
attack = list(
kill = c("kill"),
error = c("error","attack_error"),
blocked = c("blocked"),
in_play = c("in_play","continuation","covered")
)
)
is_eval <- function(x, values) tolower(x) %in% tolower(values)
R code: serve metrics (Ace%, Error%, Pressure proxy)
serve_metrics <- events %>%
filter(skill == "serve") %>%
mutate(
is_ace = is_eval(evaluation, eval_map$serve$ace),
is_error = is_eval(evaluation, eval_map$serve$error)
) %>%
group_by(match_id, team) %>%
summarise(
serves = n(),
aces = sum(is_ace),
errors = sum(is_error),
ace_pct = aces / serves,
err_pct = errors / serves,
.groups = "drop"
)
serve_metrics
R code: passing metrics (Perfect%, Positive%, Passing Efficiency)
pass_metrics <- events %>%
filter(skill == "pass") %>%
mutate(
perfect = is_eval(evaluation, eval_map$pass$perfect),
positive = is_eval(evaluation, eval_map$pass$positive),
negative = is_eval(evaluation, eval_map$pass$negative),
error = is_eval(evaluation, eval_map$pass$error),
# A common numeric scale (0..3)
pass_score = case_when(
perfect ~ 3,
positive ~ 2,
negative ~ 1,
error ~ 0,
TRUE ~ NA_real_
)
) %>%
group_by(match_id, team, player) %>%
summarise(
passes = n(),
perfect_pct = mean(perfect, na.rm = TRUE),
positive_pct = mean(positive, na.rm = TRUE),
error_pct = mean(error, na.rm = TRUE),
avg_pass = mean(pass_score, na.rm = TRUE),
.groups = "drop"
) %>%
arrange(desc(avg_pass), desc(passes))
pass_metrics %>% slice_head(n = 20)
R code: attack metrics (Kill%, Error%, Blocked%, Efficiency)
attack_metrics <- events %>%
filter(skill == "attack") %>%
mutate(
kill = is_eval(evaluation, eval_map$attack$kill),
error = is_eval(evaluation, eval_map$attack$error),
blocked = is_eval(evaluation, eval_map$attack$blocked)
) %>%
group_by(match_id, team, player) %>%
summarise(
attempts = n(),
kills = sum(kill),
errors = sum(error),
blocks = sum(blocked),
kill_pct = kills / attempts,
error_pct = errors / attempts,
blocked_pct = blocks / attempts,
eff = (kills - errors) / attempts,
.groups = "drop"
) %>%
arrange(desc(eff), desc(attempts))
attack_metrics %>% slice_head(n = 20)
R code: blocking & digging (simple event-based)
defense_metrics <- events %>%
filter(skill %in% c("block","dig")) %>%
mutate(
point = evaluation %in% c("stuff","kill_block","point"),
error = evaluation %in% c("error","net","out")
) %>%
group_by(match_id, team, player, skill) %>%
summarise(
actions = n(),
points = sum(point),
errors = sum(error),
point_rate = points / actions,
.groups = "drop"
)
defense_metrics
< section id="sideout">
Sideout, Break Point, Transition & Rally Phase Analytics
If you only measure one thing in volleyball, measure sideout efficiency. Most matches are decided by who wins more sideout points and who generates more break points. In R, you can compute SO% and BP% directly from rally winners and serving team.
R code: compute SO% and BP% per team
rallies <- events %>%
group_by(match_id, set_no, rally_id) %>%
summarise(
serving_team = team[which(skill == "serve")[1]],
point_won_by = first(na.omit(point_won_by)),
.groups = "drop"
) %>%
mutate(
receiving_team = if_else(point_won_by == serving_team, NA_character_, NA_character_)
)
# Derive receiving team robustly by looking at teams in the rally
rallies <- events %>%
group_by(match_id, set_no, rally_id) %>%
summarise(
teams_in_rally = list(unique(team)),
serving_team = team[which(skill == "serve")[1]],
point_won_by = first(na.omit(point_won_by)),
.groups = "drop"
) %>%
mutate(
receiving_team = map2_chr(teams_in_rally, serving_team, ~ setdiff(.x, .y)[1]),
sideout_success = point_won_by == receiving_team,
break_point_success = point_won_by == serving_team
)
so_bp <- rallies %>%
pivot_longer(cols = c(serving_team, receiving_team),
names_to = "role", values_to = "team") %>%
group_by(match_id, team, role) %>%
summarise(
opps = n(),
points = sum(if_else(role == "receiving_team", sideout_success, break_point_success)),
pct = points / opps,
.groups = "drop"
) %>%
mutate(metric = if_else(role == "receiving_team", "SO%", "BP%")) %>%
select(match_id, team, metric, opps, points, pct)
so_bp
R code: First-ball sideout (FBSO) using pass quality
A classic volleyball KPI: do we sideout on the first attack after serve receive? Add pass quality segmentation: perfect/positive/negative passes and their first-ball sideout probability.
first_ball_sideout <- function(df) {
# Identify: for each rally receiving team, find the first pass and first attack.
df %>%
group_by(match_id, set_no, rally_id) %>%
mutate(
serving_team = team[which(skill == "serve")[1]],
receiving_team = setdiff(unique(team), serving_team)[1]
) %>%
ungroup() %>%
group_by(match_id, set_no, rally_id, receiving_team) %>%
summarise(
pass_eval = evaluation[which(skill == "pass" & team == receiving_team)[1]],
first_attack_eval = evaluation[which(skill == "attack" & team == receiving_team)[1]],
point_won_by = first(na.omit(point_won_by)),
fbso = point_won_by == receiving_team & first_attack_eval %in% c("kill"),
.groups = "drop"
)
}
fbso <- first_ball_sideout(events) %>%
mutate(
pass_bucket = case_when(
tolower(pass_eval) %in% eval_map$pass$perfect ~ "perfect",
tolower(pass_eval) %in% eval_map$pass$positive ~ "positive",
tolower(pass_eval) %in% eval_map$pass$negative ~ "negative",
tolower(pass_eval) %in% eval_map$pass$error ~ "error",
TRUE ~ "unknown"
)
) %>%
group_by(match_id, receiving_team, pass_bucket) %>%
summarise(
opps = n(),
fbso_points = sum(fbso, na.rm = TRUE),
fbso_pct = fbso_points / opps,
.groups = "drop"
) %>%
arrange(desc(fbso_pct))
fbso
< section id="rotation">
Rotation, Lineup, Setter Distribution & Matchups
Rotation analysis is where volleyball analytics becomes coaching gold. Questions you can answer with R:
- Which rotations are most efficient in sideout and transition?
- Which lineups generate the best net rating (points won minus points lost)?
- Does the setter distribution change under pressure or after poor passes?
- Which matchup patterns appear vs. specific blockers or defenders?
R code: rotation efficiency
rotation_efficiency <- events %>%
group_by(match_id, set_no, rally_id) %>%
summarise(
serving_team = team[which(skill == "serve")[1]],
point_won_by = first(na.omit(point_won_by)),
# rotation of the receiving team at first pass (common reference)
receiving_team = setdiff(unique(team), serving_team)[1],
receive_rotation = rotation[which(skill == "pass" & team == receiving_team)[1]],
.groups = "drop"
) %>%
group_by(match_id, receiving_team, receive_rotation) %>%
summarise(
opps = n(),
so_points = sum(point_won_by == receiving_team, na.rm = TRUE),
so_pct = so_points / opps,
.groups = "drop"
) %>%
arrange(desc(so_pct))
rotation_efficiency
R code: setter distribution by pass quality and score pressure
# We assume "set" rows include target_zone or target_player info; if not, join from your tagging.
# This example uses end_zone as a proxy for set location (e.g., 4/2/3/back).
setter_distribution <- events %>%
group_by(match_id, set_no, rally_id) %>%
mutate(
serving_team = team[which(skill == "serve")[1]],
receiving_team = setdiff(unique(team), serving_team)[1],
receive_pass_score = case_when(
skill == "pass" & team == receiving_team & tolower(evaluation) %in% eval_map$pass$perfect ~ 3,
skill == "pass" & team == receiving_team & tolower(evaluation) %in% eval_map$pass$positive ~ 2,
skill == "pass" & team == receiving_team & tolower(evaluation) %in% eval_map$pass$negative ~ 1,
skill == "pass" & team == receiving_team & tolower(evaluation) %in% eval_map$pass$error ~ 0,
TRUE ~ NA_real_
)
) %>%
ungroup() %>%
group_by(match_id, set_no, rally_id) %>%
summarise(
team = first(receiving_team),
pass_score = first(na.omit(receive_pass_score)),
set_zone = end_zone[which(skill == "set" & team == first(receiving_team))[1]],
score_diff = (first(na.omit(score_team)) - first(na.omit(score_opp))),
pressure = abs(score_diff) <= 2, # "close score" proxy
.groups = "drop"
) %>%
filter(!is.na(set_zone), !is.na(pass_score)) %>%
mutate(pass_bucket = factor(pass_score, levels = c(0,1,2,3),
labels = c("error","negative","positive","perfect")))
setter_distribution_summary <- setter_distribution %>%
group_by(team, pass_bucket, pressure, set_zone) %>%
summarise(n = n(), .groups = "drop") %>%
group_by(team, pass_bucket, pressure) %>%
mutate(pct = n / sum(n)) %>%
arrange(team, pass_bucket, pressure, desc(pct))
setter_distribution_summary
This is the foundation for scouting reports: “On perfect passes in close score, they set Zone 4 ~52%.”
< section id="serve-receive">Serve & Serve-Receive Analytics (Zones, Heatmaps, Pressure)
Modern serve analytics combines zone targeting, pass degradation, and point outcomes. Even if you don’t track ball coordinates, zones 1–6 (or 1–9) are enough for powerful insights.
R code: serve target heatmap by end_zone
library(ggplot2)
serve_zones <- events %>%
filter(skill == "serve") %>%
count(team, end_zone, name = "serves") %>%
group_by(team) %>%
mutate(pct = serves / sum(serves)) %>%
ungroup()
ggplot(serve_zones, aes(x = factor(end_zone), y = pct)) +
geom_col() +
facet_wrap(~ team) +
labs(
title = "Serve Target Distribution by Zone",
x = "End Zone (Serve Target)",
y = "Share of Serves"
)
R code: serve pressure proxy via opponent pass score
serve_pressure <- events %>%
group_by(match_id, set_no, rally_id) %>%
summarise(
serving_team = team[which(skill == "serve")[1]],
receiving_team = setdiff(unique(team), serving_team)[1],
serve_end_zone = end_zone[which(skill == "serve")[1]],
pass_eval = evaluation[which(skill == "pass" & team == receiving_team)[1]],
point_won_by = first(na.omit(point_won_by)),
.groups = "drop"
) %>%
mutate(
pass_score = case_when(
tolower(pass_eval) %in% eval_map$pass$perfect ~ 3,
tolower(pass_eval) %in% eval_map$pass$positive ~ 2,
tolower(pass_eval) %in% eval_map$pass$negative ~ 1,
tolower(pass_eval) %in% eval_map$pass$error ~ 0,
TRUE ~ NA_real_
),
pressure = pass_score <= 1,
ace = FALSE # if you track aces at serve level, set it here
)
serve_pressure_summary <- serve_pressure %>%
group_by(serving_team, serve_end_zone) %>%
summarise(
serves = n(),
avg_opp_pass = mean(pass_score, na.rm = TRUE),
pressure_rate = mean(pressure, na.rm = TRUE),
bp_rate = mean(point_won_by == serving_team, na.rm = TRUE),
.groups = "drop"
) %>%
arrange(desc(bp_rate))
serve_pressure_summary
With this table, you can say: “Serving zone 5 creates low passes 38% of the time and increases break-point rate.”
< section id="shot-charts">Attack Shot Charts, Zones, Tendencies & Scouting
Attack analytics becomes powerful when you connect attack zone, target area, block context, and outcome. Even simple zone models can guide scouting: “Their opposite hits sharp to zone 1 on bad passes.”
R code: attack tendency table by start_zone → end_zone
attack_tendencies <- events %>% filter(skill == "attack") %>% count(team, player, start_zone, end_zone, name = "attempts") %>% group_by(team, player) %>% mutate(pct = attempts / sum(attempts)) %>% ungroup() %>% arrange(team, player, desc(pct)) attack_tendencies %>% slice_head(n = 30)
R code: attack efficiency by zone and pass bucket
attack_with_pass <- events %>%
group_by(match_id, set_no, rally_id) %>%
mutate(
serving_team = team[which(skill == "serve")[1]],
receiving_team = setdiff(unique(team), serving_team)[1],
pass_eval = evaluation[which(skill == "pass" & team == receiving_team)[1]]
) %>%
ungroup() %>%
filter(skill == "attack", team == receiving_team) %>%
mutate(
pass_bucket = case_when(
tolower(pass_eval) %in% eval_map$pass$perfect ~ "perfect",
tolower(pass_eval) %in% eval_map$pass$positive ~ "positive",
tolower(pass_eval) %in% eval_map$pass$negative ~ "negative",
tolower(pass_eval) %in% eval_map$pass$error ~ "error",
TRUE ~ "unknown"
),
kill = tolower(evaluation) %in% eval_map$attack$kill,
error = tolower(evaluation) %in% eval_map$attack$error
) %>%
group_by(team, player, start_zone, pass_bucket) %>%
summarise(
attempts = n(),
kill_pct = mean(kill, na.rm = TRUE),
eff = (sum(kill) - sum(error)) / attempts,
.groups = "drop"
) %>%
arrange(desc(eff))
attack_with_pass
R code: simple shot chart plot (end_zone)
shot_chart <- events %>%
filter(skill == "attack") %>%
mutate(
outcome = case_when(
tolower(evaluation) %in% eval_map$attack$kill ~ "kill",
tolower(evaluation) %in% eval_map$attack$error ~ "error",
tolower(evaluation) %in% eval_map$attack$blocked ~ "blocked",
TRUE ~ "in_play"
)
)
ggplot(shot_chart, aes(x = factor(end_zone), fill = outcome)) +
geom_bar(position = "fill") +
facet_wrap(~ player) +
labs(
title = "Attack Outcome Mix by Target Zone (End Zone)",
x = "Target Zone",
y = "Share"
)
< section id="models">
Modeling: Expected Sideout, Win Probability, Elo, Markov Chains
Once your event model is clean, you can move beyond descriptive KPIs into modeling: expected sideout (xSO), expected point (xP), win probability, and strategy simulation.
R code: expected sideout (logistic regression baseline)
library(broom)
# Create a rally-level modeling table
rally_model_df <- events %>%
group_by(match_id, set_no, rally_id) %>%
summarise(
serving_team = team[which(skill == "serve")[1]],
receiving_team = setdiff(unique(team), serving_team)[1],
pass_eval = evaluation[which(skill == "pass" & team == receiving_team)[1]],
pass_score = case_when(
tolower(pass_eval) %in% eval_map$pass$perfect ~ 3,
tolower(pass_eval) %in% eval_map$pass$positive ~ 2,
tolower(pass_eval) %in% eval_map$pass$negative ~ 1,
tolower(pass_eval) %in% eval_map$pass$error ~ 0,
TRUE ~ NA_real_
),
serve_zone = end_zone[which(skill == "serve")[1]],
point_won_by = first(na.omit(point_won_by)),
.groups = "drop"
) %>%
filter(!is.na(pass_score), !is.na(serve_zone)) %>%
mutate(
sideout_success = point_won_by == receiving_team
)
# Baseline xSO model
xso_fit <- glm(
sideout_success ~ pass_score + factor(serve_zone),
data = rally_model_df,
family = binomial()
)
tidy(xso_fit)
summary(xso_fit)
rally_model_df <- rally_model_df %>%
mutate(xSO = predict(xso_fit, type = "response"))
rally_model_df %>%
group_by(receiving_team) %>%
summarise(
actual_SO = mean(sideout_success),
expected_SO = mean(xSO),
delta = actual_SO - expected_SO,
.groups = "drop"
) %>%
arrange(desc(delta))
R code: simple set-level win probability from score differential
# If you have event-level score columns, you can build a win probability model.
# Here we illustrate a simple logistic model from score differential and set number.
wp_df <- events %>%
filter(!is.na(score_team), !is.na(score_opp)) %>%
mutate(score_diff = score_team - score_opp) %>%
group_by(match_id, set_no, rally_id) %>%
summarise(
team = first(team),
score_diff = first(score_diff),
point_won_by = first(na.omit(point_won_by)),
.groups = "drop"
) %>%
mutate(won_point = point_won_by == team)
wp_fit <- glm(won_point ~ score_diff + factor(set_no), data = wp_df, family = binomial())
wp_df <- wp_df %>%
mutate(win_prob_point = predict(wp_fit, type = "response"))
wp_fit %>% broom::tidy()
R code: Elo ratings for volleyball teams
# Minimal Elo example (team-level). You can replace with your season match table.
matches <- tibble(
match_id = c("m1","m2","m3"),
date = as.Date(c("2025-09-01","2025-09-05","2025-09-10")),
home = c("Team A","Team B","Team A"),
away = c("Team B","Team C","Team C"),
winner = c("Team A","Team C","Team A")
)
elo_update <- function(r_home, r_away, home_won, k = 20) {
p_home <- 1 / (1 + 10^((r_away - r_home)/400))
s_home <- ifelse(home_won, 1, 0)
r_home_new <- r_home + k * (s_home - p_home)
r_away_new <- r_away + k * ((1 - s_home) - (1 - p_home))
list(home = r_home_new, away = r_away_new, p_home = p_home)
}
teams <- sort(unique(c(matches$home, matches$away)))
ratings <- setNames(rep(1500, length(teams)), teams)
elo_log <- vector("list", nrow(matches))
for (i in seq_len(nrow(matches))) {
m <- matches[i,]
rH <- ratings[[m$home]]
rA <- ratings[[m$away]]
upd <- elo_update(rH, rA, home_won = (m$winner == m$home))
ratings[[m$home]] <- upd$home
ratings[[m$away]] <- upd$away
elo_log[[i]] <- tibble(match_id = m$match_id, p_home = upd$p_home,
home = m$home, away = m$away,
winner = m$winner,
r_home_pre = rH, r_away_pre = rA,
r_home_post = upd$home, r_away_post = upd$away)
}
bind_rows(elo_log) %>% arrange(match_id)
tibble(team = names(ratings), elo = as.numeric(ratings)) %>% arrange(desc(elo))
R code: Markov chain model for rally outcomes (conceptual starter)
A Markov model represents rally states like: Serve → Pass → Set → Attack → (Point/Continuation). Below is a lightweight starting template to estimate transition probabilities from event sequences.
library(stringr)
# Build simple sequences per rally: skill chain for receiving team until point ends
rally_sequences <- events %>%
arrange(match_id, set_no, rally_id) %>%
group_by(match_id, set_no, rally_id) %>%
summarise(
serving_team = team[which(skill == "serve")[1]],
receiving_team = setdiff(unique(team), serving_team)[1],
seq = paste(skill, collapse = "-"),
point_won_by = first(na.omit(point_won_by)),
.groups = "drop"
)
# Count bigrams (transitions) from sequences
extract_bigrams <- function(seq_str) {
tokens <- str_split(seq_str, "-", simplify = TRUE)
tokens <- tokens[tokens != ""]
if (length(tokens) < 2) return(tibble(from = character(), to = character()))
tibble(from = tokens[-length(tokens)], to = tokens[-1])
}
transitions <- rally_sequences %>%
mutate(bigrams = map(seq, extract_bigrams)) %>%
select(match_id, bigrams) %>%
unnest(bigrams) %>%
count(from, to, name = "n") %>%
group_by(from) %>%
mutate(p = n / sum(n)) %>%
ungroup() %>%
arrange(from, desc(p))
transitions
< section id="tidymodels">
Predictive Modeling with tidymodels
If you want production-grade modeling in R, use tidymodels: pipelines, cross-validation, recipes, metrics, and model tuning. Here is an end-to-end example predicting sideout success using pass score + serve zone.
R code: tidymodels xSO pipeline
library(tidymodels)
df <- rally_model_df %>%
mutate(
serve_zone = factor(serve_zone),
receiving_team = factor(receiving_team)
)
set.seed(2026)
split <- initial_split(df, prop = 0.8, strata = sideout_success)
train <- training(split)
test <- testing(split)
rec <- recipe(sideout_success ~ pass_score + serve_zone, data = train) %>%
step_impute_median(all_numeric_predictors()) %>%
step_dummy(all_nominal_predictors())
model <- logistic_reg() %>%
set_engine("glm")
wf <- workflow() %>%
add_recipe(rec) %>%
add_model(model)
fit <- wf %>% fit(data = train)
pred <- predict(fit, test, type = "prob") %>%
bind_cols(test %>% select(sideout_success))
roc_auc(pred, truth = sideout_success, .pred_TRUE)
accuracy(predict(fit, test) %>% bind_cols(test), truth = sideout_success, estimate = .pred_class)
R code: add player random effects with mixed models (glmm)
# For player/team variation, you can use lme4 (not tidymodels-native).
install.packages("lme4")
library(lme4)
# Example: include receiving_team as a random intercept
xso_glmm <- glmer(
sideout_success ~ pass_score + factor(serve_zone) + (1 | receiving_team),
data = rally_model_df,
family = binomial()
)
summary(xso_glmm)
< section id="bayes">
Bayesian Volleyball Analytics in R
Bayesian models are ideal when you want uncertainty, shrinkage, and better inference with small samples. In volleyball scouting, sample sizes can be tiny (a few matches), so Bayesian partial pooling is often a win.
R code: Bayesian xSO with brms
# Bayesian logistic regression with partial pooling by receiving team
install.packages("brms")
library(brms)
bayes_fit <- brm(
sideout_success ~ pass_score + factor(serve_zone) + (1 | receiving_team),
data = rally_model_df,
family = bernoulli(),
chains = 2, cores = 2, iter = 1500,
seed = 2026
)
summary(bayes_fit)
posterior_summary(bayes_fit)
With brms, you can compute posterior distributions of SO% by team, compare strategies, and avoid overreacting to noise.
< section id="viz">Visualization: ggplot2 Templates for Volleyball
Volleyball visualizations should be coach-friendly, quick to read, and tied to decisions: serve target, pass quality, rotation weaknesses, attack tendencies, and pressure points.
R code: SO% and BP% report chart
so_bp_wide <- so_bp %>% select(team, metric, pct) %>% pivot_wider(names_from = metric, values_from = pct) so_bp_long <- so_bp %>% ggplot(aes(x = team, y = pct, fill = metric)) + geom_col(position = "dodge") + coord_flip() + labs(title = "Sideout % and Break Point % by Team", x = NULL, y = "Rate") so_bp_long
R code: rotation heatmap (SO% by rotation)
rot_plot_df <- rotation_efficiency %>% mutate(receive_rotation = factor(receive_rotation, levels = 1:6)) ggplot(rot_plot_df, aes(x = receive_rotation, y = receiving_team, fill = so_pct)) + geom_tile() + labs(title = "Rotation Sideout Heatmap", x = "Rotation (Receiving)", y = "Team")
R code: fast HTML tables with gt
library(gt) attack_metrics %>% filter(attempts >= 10) %>% arrange(desc(eff)) %>% gt() %>% fmt_percent(columns = c(kill_pct, error_pct, blocked_pct), decimals = 1) %>% fmt_number(columns = eff, decimals = 3) %>% tab_header(title = "Attack Leaderboard (Min 10 Attempts)")< section id="shiny">
Dashboards: Shiny Scouting Reports
A Shiny scouting app can deliver instant insights for coaches: opponent serve targets, rotation weaknesses, attacker tendencies, and key matchups. Below is a compact Shiny template you can expand.
R code: minimal Shiny dashboard for team scouting
install.packages(c("shiny","bslib"))
library(shiny)
library(bslib)
library(tidyverse)
# Assume you already computed:
# - serve_pressure_summary
# - rotation_efficiency
# - attack_tendencies
ui <- page_sidebar(
title = "Volleyball Analytics Dashboard (R + Shiny)",
sidebar = sidebar(
selectInput("team", "Select Team", choices = sort(unique(serve_pressure_summary$serving_team))),
hr(),
helpText("Key views: serve targets, rotation sideout, attack tendencies.")
),
layout_columns(
card(
card_header("Serve Targets by Zone"),
plotOutput("servePlot", height = 260)
),
card(
card_header("Rotation Sideout %"),
plotOutput("rotPlot", height = 260)
),
card(
card_header("Top Attack Tendencies"),
tableOutput("attackTable")
)
)
)
server <- function(input, output, session) {
output$servePlot <- renderPlot({
df <- serve_pressure_summary %>% filter(serving_team == input$team)
ggplot(df, aes(x = factor(serve_end_zone), y = bp_rate)) +
geom_col() +
labs(x = "Serve End Zone", y = "Break Point Rate", title = paste("Serve Effectiveness -", input$team))
})
output$rotPlot <- renderPlot({
df <- rotation_efficiency %>% filter(receiving_team == input$team) %>%
mutate(receive_rotation = factor(receive_rotation, levels = 1:6))
ggplot(df, aes(x = receive_rotation, y = so_pct)) +
geom_col() +
labs(x = "Rotation", y = "Sideout %", title = paste("Rotation Sideout -", input$team))
})
output$attackTable <- renderTable({
attack_tendencies %>%
filter(team == input$team) %>%
group_by(player) %>%
slice_max(order_by = pct, n = 5) %>%
ungroup() %>%
arrange(desc(pct)) %>%
mutate(pct = round(pct * 100, 1))
})
}
shinyApp(ui, server)
< section id="automation">
Automation: Reports to HTML/PDF + CI
One of the best uses of R in volleyball: automated weekly scouting reports. Generate: HTML match report, PDF coaching packet, and tables/figures for staff.
R code: Quarto report skeleton
# Create a Quarto (.qmd) file like reports/match_report.qmd
# Then render in R:
# quarto::quarto_render("reports/match_report.qmd")
# Example render call:
quarto::quarto_render(
input = "reports/match_report.qmd",
execute_params = list(match_id = "match_001")
)
Example Quarto front matter (paste into .qmd)
---
title: "Match Report"
format:
html:
toc: true
code-fold: show
execute:
echo: true
warning: false
message: false
params:
match_id: "match_001"
---
< section id="best-practices">
Best Practices + Common Pitfalls
- Define evaluation codes once and reuse them everywhere (serve/pass/attack mappings).
- Keep raw data immutable in
data/raw; write cleaned data todata/processed. - Separate scouting vs. performance analysis: scouting focuses on tendencies; performance focuses on efficiency.
- Beware small samples (one match). Use Bayesian shrinkage or confidence intervals.
- Rotation context matters: opponent rotations, server strength, and pass quality heavily confound results.
- Don’t overfit: models should generalize across matches and opponents.
- Make outputs coach-readable: simple tables, clear charts, and “so what?” conclusions.
R code: quick bootstrap CI for SO%
set.seed(2026)
bootstrap_ci <- function(x, B = 2000, conf = 0.95) {
n <- length(x)
boots <- replicate(B, mean(sample(x, n, replace = TRUE)))
alpha <- (1 - conf) / 2
quantile(boots, probs = c(alpha, 1 - alpha), na.rm = TRUE)
}
so_ci <- rallies %>%
mutate(sideout_success = point_won_by == receiving_team) %>%
group_by(receiving_team) %>%
summarise(
so = mean(sideout_success),
ci_low = bootstrap_ci(sideout_success)[1],
ci_high = bootstrap_ci(sideout_success)[2],
n = n(),
.groups = "drop"
)
so_ci
< section id="recommended">
Recommended Book
If you want a structured, practical resource that goes deeper into volleyball analytics workflows, R code patterns, scouting/reporting, and modeling concepts, check out this book:
Volleyball Analytics with R (Recommended Book)
It’s a great companion if you’re building a complete R-based analytics stack for a club, federation, or collegiate program.
< section id="faq">FAQ
What’s the best single metric in volleyball?
If you only track one KPI: Sideout %. It correlates strongly with winning because it reflects serve-receive stability and first-ball offense conversion.
How do I handle different coding systems?
Create a mapping layer (like eval_map) and convert raw labels into a standardized internal vocabulary.
The rest of your pipeline should never depend on raw coding strings.
Can I do volleyball analytics without coordinates?
Yes. Zone-based analytics (1–6 or 1–9) plus pass quality and outcome are enough for rotation analysis, serve targeting, and basic predictive modeling.
What should I build first?
Start with: import + clean → SO% / BP% → pass + serve dashboards → rotation sideout → attack efficiency by pass quality. Once those are stable, add modeling.
< footer class="post-footer">Tags: volleyball analytics with R, R volleyball stats, sideout percentage, rotation analysis, serve receive, scouting report, tidymodels, ggplot2, Shiny dashboard
The post Volleyball Analytics with R: The Complete Guide to Match Data, Sideout Efficiency, Serve Pressure, Heatmaps, and Predictive Models appeared first on R Programming Books.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
