Volleyball Analytics with R: The Complete Guide to Match Data, Sideout Efficiency, Serve Pressure, Heatmaps, and Predictive Models

rprogrammingbooks

5 hours ago

[This article was first published on Blog - R Programming Books, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

< article class="post volleyball-analytics-with-r"> < content="en" /> < content="volleyball analytics with R, volleyball data analysis, R volleyball stats, volleyball scouting R, DataVolley R, volleyball match analysis, volleyball rotation analysis, sideout efficiency, serve receive analytics, passing rating, attack efficiency, kill rate, error rate, transition offense, break point, rally modeling, expected sideout, plus-minus volleyball, libero analytics, setter distribution, volleyball visualization R, shiny volleyball dashboard, volleyball predictive modeling, xG volleyball, Elo volleyball, Markov chain volleyball, bayesian volleyball, tidymodels volleyball, ggplot2 volleyball, volleyMetrics, volleyball performance analysis" /> < header class="post-header">

Volleyball Analytics

Volleyball Analytics with R: A Practical, End-to-End Playbook

Build a full volleyball analytics workflow in R: data collection, cleaning, scouting reports, skill KPIs, rotation/lineup analysis, sideout & transition, serve/receive, visualization, dashboards, and predictive modeling.

< nav class="toc" aria-label="Table of contents">

< section id="why">

Why Volleyball Analytics (and Why R)

Volleyball is a sequence of discrete events (serve, pass, set, attack, block, dig) organized into rallies and phases (sideout vs. transition). This structure makes it ideal for: event-based analytics, rotation analysis, scouting tendencies, expected efficiency modeling, and win probability.

R excels at this because of tidy data workflows (dplyr/tidyr), great visualization (ggplot2), modern modeling (tidymodels, brms), and easy reporting (Quarto/R Markdown). If you want a repeatable volleyball analytics pipeline for your club or team, R is a perfect fit.

Keywords you should care about

Sideout % (SO%), Break Point % (BP%), Transition Efficiency
Serve Pressure, Passing Rating, First Ball Sideout
Attack Efficiency (kills – errors)/attempts, Kill Rate
Rotation Efficiency, Lineup Net Rating, Setter Distribution
Expected Sideout, Expected Point, Win Probability
Scouting Tendencies, Shot Charts, Serve Target Heatmaps

< section id="data-model">

Volleyball Data Model: Events, Rally, Set, Match

A practical volleyball dataset in R usually includes one row per contact or one row per event. The minimum columns for serious analytics:

match_id, set_no, rally_id, point_won_by
team, player, skill (serve, pass, set, attack, block, dig)
evaluation (e.g., error, poor, ok, good, perfect, kill, continuation)
start_zone, end_zone (serve zones, attack zones)
rotation, server, receive_formation
score_home, score_away, home_team, away_team

R code: create a minimal event schema

library(tidyverse)
library(lubridate)

event_schema <- tibble::tibble(
  match_id = character(),
  datetime = ymd_hms(character()),
  set_no = integer(),
  rally_id = integer(),
  home_team = character(),
  away_team = character(),
  team = character(),        # team performing the action
  opponent = character(),    # opponent of team
  player = character(),
  jersey = integer(),
  skill = factor(levels = c("serve","pass","set","attack","block","dig","freeball")),
  evaluation = character(),  # e.g., "error","ace","perfect","positive","negative","kill","blocked","dig"
  start_zone = integer(),    # 1..6 (or 1..9 depending system)
  end_zone = integer(),
  rotation = integer(),      # 1..6
  phase = factor(levels = c("sideout","transition")),  # derived later
  score_team = integer(),    # score for team at time of event
  score_opp  = integer(),
  point_won_by = character(), # which team won rally point
  stringsAsFactors = FALSE
)

glimpse(event_schema)

You can extend this schema with positional labels (OH, MB, OPP, S, L), contact order (1st/2nd/3rd), attack tempo, block touches, etc.

< section id="data-sources">

Data Sources: Manual Logs, Video Tags, DataVolley-Style Exports

Volleyball data typically arrives as: (1) manual spreadsheets, (2) video tagging exports, or (3) scouting software exports. Regardless of source, your R pipeline should:

Import raw data
Normalize team/player names
Create rally keys (match_id/set_no/rally_id)
Derive phases (sideout vs. transition)
Compute KPIs and reporting tables

R code: robust import helpers

library(readr)
library(janitor)

read_events_csv <- function(path) {
  readr::read_csv(path, show_col_types = FALSE) %>%
    janitor::clean_names() %>%
    mutate(
      set_no = as.integer(set_no),
      rally_id = as.integer(rally_id),
      start_zone = as.integer(start_zone),
      end_zone = as.integer(end_zone),
      rotation = as.integer(rotation)
    )
}

normalize_names <- function(df) {
  df %>%
    mutate(
      team = str_squish(str_to_title(team)),
      opponent = str_squish(str_to_title(opponent)),
      player = str_squish(str_to_title(player)),
      evaluation = str_squish(str_to_lower(evaluation)),
      skill = factor(str_to_lower(skill),
                    levels = c("serve","pass","set","attack","block","dig","freeball"))
    )
}

Tip for SEO + practice: call your columns and metrics consistently across posts: SO%, BP%, ACE%, ERR%, Kill%, Eff%, Pos%.

< section id="project-setup">

R Project Setup & Reproducibility

Serious volleyball analytics needs reproducibility: same input data, same R version, same packages, same outputs. Use an R project + renv + Quarto.

R code: create a project scaffold

# Run once inside your project
install.packages(c("renv","quarto","tidyverse","lubridate","janitor","gt","patchwork","tidymodels"))

renv::init()

# Recommended folder structure
dir.create("data/raw", recursive = TRUE, showWarnings = FALSE)
dir.create("data/processed", recursive = TRUE, showWarnings = FALSE)
dir.create("R", showWarnings = FALSE)
dir.create("reports", showWarnings = FALSE)
dir.create("figures", showWarnings = FALSE)

R code: create a simple metric dictionary

metric_dictionary <- tribble(
  ~metric, ~definition,
  "SO%", "Sideout percentage: points won when receiving serve / total receive opportunities",
  "BP%", "Break point percentage: points won when serving / total serving opportunities",
  "Kill%", "Kills / attack attempts",
  "Eff%", "(Kills - Errors) / attempts",
  "Ace%", "Aces / total serves",
  "Err%", "Serve errors / total serves"
)

metric_dictionary

< section id="import-clean">

Import & Clean Volleyball Event Data

Most problems in volleyball analytics are data quality problems: inconsistent team names, missing rally keys, duplicated rows, weird evaluation labels, or mixed zone definitions.

R code: import + normalize + validate

events_raw <- read_events_csv("data/raw/events.csv")
events <- events_raw %>% normalize_names()

# Basic validation
stopifnot(all(c("match_id","set_no","rally_id","team","skill","evaluation") %in% names(events)))

# Remove obvious duplicates (same match/set/rally/team/player/skill)
events <- events %>%
  distinct(match_id, set_no, rally_id, team, player, skill, evaluation, .keep_all = TRUE)

# Ensure opponent field exists
events <- events %>%
  mutate(opponent = if_else(is.na(opponent) | opponent == "",
                            NA_character_, opponent))

# Quick data quality report
quality_report <- list(
  n_rows = nrow(events),
  n_matches = n_distinct(events$match_id),
  missing_player = mean(is.na(events$player) | events$player == ""),
  missing_zone = mean(is.na(events$start_zone)),
  skill_counts = events %>% count(skill, sort = TRUE)
)

quality_report

R code: derive rally winner and rally phase

A common approach: identify which team served in the rally. If a team receives serve, that is a sideout opportunity. If a team is serving, that is a break point opportunity. You can derive phase per team within each rally.

derive_rally_context <- function(df) {
  df %>%
    group_by(match_id, set_no, rally_id) %>%
    mutate(
      serving_team = team[which(skill == "serve")[1]],
      receiving_team = setdiff(unique(team), serving_team)[1],
      phase = case_when(
        team == receiving_team ~ "sideout",
        team == serving_team   ~ "transition",
        TRUE ~ NA_character_
      ) %>% factor(levels = c("sideout","transition"))
    ) %>%
    ungroup()
}

events <- derive_rally_context(events)

< section id="core-kpis">

Core Volleyball KPIs (Serve, Pass, Attack, Block, Dig)

Volleyball KPIs are best computed from event tables with clear skill and evaluation codes. Below is a practical KPI set that works for scouting and performance analysis.

R code: define standard evaluation mappings

# Customize to your coding system.
eval_map <- list(
  serve = list(
    ace = c("ace"),
    error = c("error","serve_error"),
    in_play = c("in_play","good","ok","positive","negative")
  ),
  pass = list(
    perfect = c("perfect","3"),
    positive = c("positive","2","good"),
    negative = c("negative","1","poor"),
    error = c("error","0")
  ),
  attack = list(
    kill = c("kill"),
    error = c("error","attack_error"),
    blocked = c("blocked"),
    in_play = c("in_play","continuation","covered")
  )
)

is_eval <- function(x, values) tolower(x) %in% tolower(values)

R code: serve metrics (Ace%, Error%, Pressure proxy)

serve_metrics <- events %>%
  filter(skill == "serve") %>%
  mutate(
    is_ace = is_eval(evaluation, eval_map$serve$ace),
    is_error = is_eval(evaluation, eval_map$serve$error)
  ) %>%
  group_by(match_id, team) %>%
  summarise(
    serves = n(),
    aces = sum(is_ace),
    errors = sum(is_error),
    ace_pct = aces / serves,
    err_pct = errors / serves,
    .groups = "drop"
  )

serve_metrics

R code: passing metrics (Perfect%, Positive%, Passing Efficiency)

pass_metrics <- events %>%
  filter(skill == "pass") %>%
  mutate(
    perfect = is_eval(evaluation, eval_map$pass$perfect),
    positive = is_eval(evaluation, eval_map$pass$positive),
    negative = is_eval(evaluation, eval_map$pass$negative),
    error = is_eval(evaluation, eval_map$pass$error),
    # A common numeric scale (0..3)
    pass_score = case_when(
      perfect ~ 3,
      positive ~ 2,
      negative ~ 1,
      error ~ 0,
      TRUE ~ NA_real_
    )
  ) %>%
  group_by(match_id, team, player) %>%
  summarise(
    passes = n(),
    perfect_pct = mean(perfect, na.rm = TRUE),
    positive_pct = mean(positive, na.rm = TRUE),
    error_pct = mean(error, na.rm = TRUE),
    avg_pass = mean(pass_score, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  arrange(desc(avg_pass), desc(passes))

pass_metrics %>% slice_head(n = 20)

R code: attack metrics (Kill%, Error%, Blocked%, Efficiency)

attack_metrics <- events %>%
  filter(skill == "attack") %>%
  mutate(
    kill = is_eval(evaluation, eval_map$attack$kill),
    error = is_eval(evaluation, eval_map$attack$error),
    blocked = is_eval(evaluation, eval_map$attack$blocked)
  ) %>%
  group_by(match_id, team, player) %>%
  summarise(
    attempts = n(),
    kills = sum(kill),
    errors = sum(error),
    blocks = sum(blocked),
    kill_pct = kills / attempts,
    error_pct = errors / attempts,
    blocked_pct = blocks / attempts,
    eff = (kills - errors) / attempts,
    .groups = "drop"
  ) %>%
  arrange(desc(eff), desc(attempts))

attack_metrics %>% slice_head(n = 20)

R code: blocking & digging (simple event-based)

defense_metrics <- events %>%
  filter(skill %in% c("block","dig")) %>%
  mutate(
    point = evaluation %in% c("stuff","kill_block","point"),
    error = evaluation %in% c("error","net","out")
  ) %>%
  group_by(match_id, team, player, skill) %>%
  summarise(
    actions = n(),
    points = sum(point),
    errors = sum(error),
    point_rate = points / actions,
    .groups = "drop"
  )

defense_metrics

< section id="sideout">

Sideout, Break Point, Transition & Rally Phase Analytics

If you only measure one thing in volleyball, measure sideout efficiency. Most matches are decided by who wins more sideout points and who generates more break points. In R, you can compute SO% and BP% directly from rally winners and serving team.

R code: compute SO% and BP% per team

rallies <- events %>%
  group_by(match_id, set_no, rally_id) %>%
  summarise(
    serving_team = team[which(skill == "serve")[1]],
    point_won_by = first(na.omit(point_won_by)),
    .groups = "drop"
  ) %>%
  mutate(
    receiving_team = if_else(point_won_by == serving_team, NA_character_, NA_character_)
  )

# Derive receiving team robustly by looking at teams in the rally
rallies <- events %>%
  group_by(match_id, set_no, rally_id) %>%
  summarise(
    teams_in_rally = list(unique(team)),
    serving_team = team[which(skill == "serve")[1]],
    point_won_by = first(na.omit(point_won_by)),
    .groups = "drop"
  ) %>%
  mutate(
    receiving_team = map2_chr(teams_in_rally, serving_team, ~ setdiff(.x, .y)[1]),
    sideout_success = point_won_by == receiving_team,
    break_point_success = point_won_by == serving_team
  )

so_bp <- rallies %>%
  pivot_longer(cols = c(serving_team, receiving_team),
               names_to = "role", values_to = "team") %>%
  group_by(match_id, team, role) %>%
  summarise(
    opps = n(),
    points = sum(if_else(role == "receiving_team", sideout_success, break_point_success)),
    pct = points / opps,
    .groups = "drop"
  ) %>%
  mutate(metric = if_else(role == "receiving_team", "SO%", "BP%")) %>%
  select(match_id, team, metric, opps, points, pct)

so_bp

R code: First-ball sideout (FBSO) using pass quality

A classic volleyball KPI: do we sideout on the first attack after serve receive? Add pass quality segmentation: perfect/positive/negative passes and their first-ball sideout probability.

first_ball_sideout <- function(df) {
  # Identify: for each rally receiving team, find the first pass and first attack.
  df %>%
    group_by(match_id, set_no, rally_id) %>%
    mutate(
      serving_team = team[which(skill == "serve")[1]],
      receiving_team = setdiff(unique(team), serving_team)[1]
    ) %>%
    ungroup() %>%
    group_by(match_id, set_no, rally_id, receiving_team) %>%
    summarise(
      pass_eval = evaluation[which(skill == "pass" & team == receiving_team)[1]],
      first_attack_eval = evaluation[which(skill == "attack" & team == receiving_team)[1]],
      point_won_by = first(na.omit(point_won_by)),
      fbso = point_won_by == receiving_team & first_attack_eval %in% c("kill"),
      .groups = "drop"
    )
}

fbso <- first_ball_sideout(events) %>%
  mutate(
    pass_bucket = case_when(
      tolower(pass_eval) %in% eval_map$pass$perfect ~ "perfect",
      tolower(pass_eval) %in% eval_map$pass$positive ~ "positive",
      tolower(pass_eval) %in% eval_map$pass$negative ~ "negative",
      tolower(pass_eval) %in% eval_map$pass$error ~ "error",
      TRUE ~ "unknown"
    )
  ) %>%
  group_by(match_id, receiving_team, pass_bucket) %>%
  summarise(
    opps = n(),
    fbso_points = sum(fbso, na.rm = TRUE),
    fbso_pct = fbso_points / opps,
    .groups = "drop"
  ) %>%
  arrange(desc(fbso_pct))

fbso

< section id="rotation">

Rotation, Lineup, Setter Distribution & Matchups

Rotation analysis is where volleyball analytics becomes coaching gold. Questions you can answer with R:

Which rotations are most efficient in sideout and transition?
Which lineups generate the best net rating (points won minus points lost)?
Does the setter distribution change under pressure or after poor passes?
Which matchup patterns appear vs. specific blockers or defenders?

R code: rotation efficiency

rotation_efficiency <- events %>%
  group_by(match_id, set_no, rally_id) %>%
  summarise(
    serving_team = team[which(skill == "serve")[1]],
    point_won_by = first(na.omit(point_won_by)),
    # rotation of the receiving team at first pass (common reference)
    receiving_team = setdiff(unique(team), serving_team)[1],
    receive_rotation = rotation[which(skill == "pass" & team == receiving_team)[1]],
    .groups = "drop"
  ) %>%
  group_by(match_id, receiving_team, receive_rotation) %>%
  summarise(
    opps = n(),
    so_points = sum(point_won_by == receiving_team, na.rm = TRUE),
    so_pct = so_points / opps,
    .groups = "drop"
  ) %>%
  arrange(desc(so_pct))

rotation_efficiency

R code: setter distribution by pass quality and score pressure

# We assume "set" rows include target_zone or target_player info; if not, join from your tagging.
# This example uses end_zone as a proxy for set location (e.g., 4/2/3/back).
setter_distribution <- events %>%
  group_by(match_id, set_no, rally_id) %>%
  mutate(
    serving_team = team[which(skill == "serve")[1]],
    receiving_team = setdiff(unique(team), serving_team)[1],
    receive_pass_score = case_when(
      skill == "pass" & team == receiving_team & tolower(evaluation) %in% eval_map$pass$perfect ~ 3,
      skill == "pass" & team == receiving_team & tolower(evaluation) %in% eval_map$pass$positive ~ 2,
      skill == "pass" & team == receiving_team & tolower(evaluation) %in% eval_map$pass$negative ~ 1,
      skill == "pass" & team == receiving_team & tolower(evaluation) %in% eval_map$pass$error ~ 0,
      TRUE ~ NA_real_
    )
  ) %>%
  ungroup() %>%
  group_by(match_id, set_no, rally_id) %>%
  summarise(
    team = first(receiving_team),
    pass_score = first(na.omit(receive_pass_score)),
    set_zone = end_zone[which(skill == "set" & team == first(receiving_team))[1]],
    score_diff = (first(na.omit(score_team)) - first(na.omit(score_opp))),
    pressure = abs(score_diff) <= 2,  # "close score" proxy
    .groups = "drop"
  ) %>%
  filter(!is.na(set_zone), !is.na(pass_score)) %>%
  mutate(pass_bucket = factor(pass_score, levels = c(0,1,2,3),
                              labels = c("error","negative","positive","perfect")))

setter_distribution_summary <- setter_distribution %>%
  group_by(team, pass_bucket, pressure, set_zone) %>%
  summarise(n = n(), .groups = "drop") %>%
  group_by(team, pass_bucket, pressure) %>%
  mutate(pct = n / sum(n)) %>%
  arrange(team, pass_bucket, pressure, desc(pct))

setter_distribution_summary

This is the foundation for scouting reports: “On perfect passes in close score, they set Zone 4 ~52%.”

< section id="serve-receive">

Serve & Serve-Receive Analytics (Zones, Heatmaps, Pressure)

Modern serve analytics combines zone targeting, pass degradation, and point outcomes. Even if you don’t track ball coordinates, zones 1–6 (or 1–9) are enough for powerful insights.

R code: serve target heatmap by end_zone

library(ggplot2)

serve_zones <- events %>%
  filter(skill == "serve") %>%
  count(team, end_zone, name = "serves") %>%
  group_by(team) %>%
  mutate(pct = serves / sum(serves)) %>%
  ungroup()

ggplot(serve_zones, aes(x = factor(end_zone), y = pct)) +
  geom_col() +
  facet_wrap(~ team) +
  labs(
    title = "Serve Target Distribution by Zone",
    x = "End Zone (Serve Target)",
    y = "Share of Serves"
  )

R code: serve pressure proxy via opponent pass score

serve_pressure <- events %>%
  group_by(match_id, set_no, rally_id) %>%
  summarise(
    serving_team = team[which(skill == "serve")[1]],
    receiving_team = setdiff(unique(team), serving_team)[1],
    serve_end_zone = end_zone[which(skill == "serve")[1]],
    pass_eval = evaluation[which(skill == "pass" & team == receiving_team)[1]],
    point_won_by = first(na.omit(point_won_by)),
    .groups = "drop"
  ) %>%
  mutate(
    pass_score = case_when(
      tolower(pass_eval) %in% eval_map$pass$perfect ~ 3,
      tolower(pass_eval) %in% eval_map$pass$positive ~ 2,
      tolower(pass_eval) %in% eval_map$pass$negative ~ 1,
      tolower(pass_eval) %in% eval_map$pass$error ~ 0,
      TRUE ~ NA_real_
    ),
    pressure = pass_score <= 1,
    ace = FALSE # if you track aces at serve level, set it here
  )

serve_pressure_summary <- serve_pressure %>%
  group_by(serving_team, serve_end_zone) %>%
  summarise(
    serves = n(),
    avg_opp_pass = mean(pass_score, na.rm = TRUE),
    pressure_rate = mean(pressure, na.rm = TRUE),
    bp_rate = mean(point_won_by == serving_team, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  arrange(desc(bp_rate))

serve_pressure_summary

With this table, you can say: “Serving zone 5 creates low passes 38% of the time and increases break-point rate.”

< section id="shot-charts">

Attack Shot Charts, Zones, Tendencies & Scouting

Attack analytics becomes powerful when you connect attack zone, target area, block context, and outcome. Even simple zone models can guide scouting: “Their opposite hits sharp to zone 1 on bad passes.”

R code: attack tendency table by start_zone → end_zone

attack_tendencies <- events %>%
  filter(skill == "attack") %>%
  count(team, player, start_zone, end_zone, name = "attempts") %>%
  group_by(team, player) %>%
  mutate(pct = attempts / sum(attempts)) %>%
  ungroup() %>%
  arrange(team, player, desc(pct))

attack_tendencies %>% slice_head(n = 30)

R code: attack efficiency by zone and pass bucket

attack_with_pass <- events %>%
  group_by(match_id, set_no, rally_id) %>%
  mutate(
    serving_team = team[which(skill == "serve")[1]],
    receiving_team = setdiff(unique(team), serving_team)[1],
    pass_eval = evaluation[which(skill == "pass" & team == receiving_team)[1]]
  ) %>%
  ungroup() %>%
  filter(skill == "attack", team == receiving_team) %>%
  mutate(
    pass_bucket = case_when(
      tolower(pass_eval) %in% eval_map$pass$perfect ~ "perfect",
      tolower(pass_eval) %in% eval_map$pass$positive ~ "positive",
      tolower(pass_eval) %in% eval_map$pass$negative ~ "negative",
      tolower(pass_eval) %in% eval_map$pass$error ~ "error",
      TRUE ~ "unknown"
    ),
    kill = tolower(evaluation) %in% eval_map$attack$kill,
    error = tolower(evaluation) %in% eval_map$attack$error
  ) %>%
  group_by(team, player, start_zone, pass_bucket) %>%
  summarise(
    attempts = n(),
    kill_pct = mean(kill, na.rm = TRUE),
    eff = (sum(kill) - sum(error)) / attempts,
    .groups = "drop"
  ) %>%
  arrange(desc(eff))

attack_with_pass

R code: simple shot chart plot (end_zone)

shot_chart <- events %>%
  filter(skill == "attack") %>%
  mutate(
    outcome = case_when(
      tolower(evaluation) %in% eval_map$attack$kill ~ "kill",
      tolower(evaluation) %in% eval_map$attack$error ~ "error",
      tolower(evaluation) %in% eval_map$attack$blocked ~ "blocked",
      TRUE ~ "in_play"
    )
  )

ggplot(shot_chart, aes(x = factor(end_zone), fill = outcome)) +
  geom_bar(position = "fill") +
  facet_wrap(~ player) +
  labs(
    title = "Attack Outcome Mix by Target Zone (End Zone)",
    x = "Target Zone",
    y = "Share"
  )

< section id="models">

Modeling: Expected Sideout, Win Probability, Elo, Markov Chains

Once your event model is clean, you can move beyond descriptive KPIs into modeling: expected sideout (xSO), expected point (xP), win probability, and strategy simulation.

R code: expected sideout (logistic regression baseline)

library(broom)

# Create a rally-level modeling table
rally_model_df <- events %>%
  group_by(match_id, set_no, rally_id) %>%
  summarise(
    serving_team = team[which(skill == "serve")[1]],
    receiving_team = setdiff(unique(team), serving_team)[1],
    pass_eval = evaluation[which(skill == "pass" & team == receiving_team)[1]],
    pass_score = case_when(
      tolower(pass_eval) %in% eval_map$pass$perfect ~ 3,
      tolower(pass_eval) %in% eval_map$pass$positive ~ 2,
      tolower(pass_eval) %in% eval_map$pass$negative ~ 1,
      tolower(pass_eval) %in% eval_map$pass$error ~ 0,
      TRUE ~ NA_real_
    ),
    serve_zone = end_zone[which(skill == "serve")[1]],
    point_won_by = first(na.omit(point_won_by)),
    .groups = "drop"
  ) %>%
  filter(!is.na(pass_score), !is.na(serve_zone)) %>%
  mutate(
    sideout_success = point_won_by == receiving_team
  )

# Baseline xSO model
xso_fit <- glm(
  sideout_success ~ pass_score + factor(serve_zone),
  data = rally_model_df,
  family = binomial()
)

tidy(xso_fit)
summary(xso_fit)

rally_model_df <- rally_model_df %>%
  mutate(xSO = predict(xso_fit, type = "response"))

rally_model_df %>%
  group_by(receiving_team) %>%
  summarise(
    actual_SO = mean(sideout_success),
    expected_SO = mean(xSO),
    delta = actual_SO - expected_SO,
    .groups = "drop"
  ) %>%
  arrange(desc(delta))

R code: simple set-level win probability from score differential

# If you have event-level score columns, you can build a win probability model.
# Here we illustrate a simple logistic model from score differential and set number.

wp_df <- events %>%
  filter(!is.na(score_team), !is.na(score_opp)) %>%
  mutate(score_diff = score_team - score_opp) %>%
  group_by(match_id, set_no, rally_id) %>%
  summarise(
    team = first(team),
    score_diff = first(score_diff),
    point_won_by = first(na.omit(point_won_by)),
    .groups = "drop"
  ) %>%
  mutate(won_point = point_won_by == team)

wp_fit <- glm(won_point ~ score_diff + factor(set_no), data = wp_df, family = binomial())
wp_df <- wp_df %>%
  mutate(win_prob_point = predict(wp_fit, type = "response"))

wp_fit %>% broom::tidy()

R code: Elo ratings for volleyball teams

# Minimal Elo example (team-level). You can replace with your season match table.
matches <- tibble(
  match_id = c("m1","m2","m3"),
  date = as.Date(c("2025-09-01","2025-09-05","2025-09-10")),
  home = c("Team A","Team B","Team A"),
  away = c("Team B","Team C","Team C"),
  winner = c("Team A","Team C","Team A")
)

elo_update <- function(r_home, r_away, home_won, k = 20) {
  p_home <- 1 / (1 + 10^((r_away - r_home)/400))
  s_home <- ifelse(home_won, 1, 0)
  r_home_new <- r_home + k * (s_home - p_home)
  r_away_new <- r_away + k * ((1 - s_home) - (1 - p_home))
  list(home = r_home_new, away = r_away_new, p_home = p_home)
}

teams <- sort(unique(c(matches$home, matches$away)))
ratings <- setNames(rep(1500, length(teams)), teams)

elo_log <- vector("list", nrow(matches))

for (i in seq_len(nrow(matches))) {
  m <- matches[i,]
  rH <- ratings[[m$home]]
  rA <- ratings[[m$away]]
  upd <- elo_update(rH, rA, home_won = (m$winner == m$home))
  ratings[[m$home]] <- upd$home
  ratings[[m$away]] <- upd$away
  elo_log[[i]] <- tibble(match_id = m$match_id, p_home = upd$p_home,
                         home = m$home, away = m$away,
                         winner = m$winner,
                         r_home_pre = rH, r_away_pre = rA,
                         r_home_post = upd$home, r_away_post = upd$away)
}

bind_rows(elo_log) %>% arrange(match_id)
tibble(team = names(ratings), elo = as.numeric(ratings)) %>% arrange(desc(elo))

R code: Markov chain model for rally outcomes (conceptual starter)

A Markov model represents rally states like: Serve → Pass → Set → Attack → (Point/Continuation). Below is a lightweight starting template to estimate transition probabilities from event sequences.

library(stringr)

# Build simple sequences per rally: skill chain for receiving team until point ends
rally_sequences <- events %>%
  arrange(match_id, set_no, rally_id) %>%
  group_by(match_id, set_no, rally_id) %>%
  summarise(
    serving_team = team[which(skill == "serve")[1]],
    receiving_team = setdiff(unique(team), serving_team)[1],
    seq = paste(skill, collapse = "-"),
    point_won_by = first(na.omit(point_won_by)),
    .groups = "drop"
  )

# Count bigrams (transitions) from sequences
extract_bigrams <- function(seq_str) {
  tokens <- str_split(seq_str, "-", simplify = TRUE)
  tokens <- tokens[tokens != ""]
  if (length(tokens) < 2) return(tibble(from = character(), to = character()))
  tibble(from = tokens[-length(tokens)], to = tokens[-1])
}

transitions <- rally_sequences %>%
  mutate(bigrams = map(seq, extract_bigrams)) %>%
  select(match_id, bigrams) %>%
  unnest(bigrams) %>%
  count(from, to, name = "n") %>%
  group_by(from) %>%
  mutate(p = n / sum(n)) %>%
  ungroup() %>%
  arrange(from, desc(p))

transitions

< section id="tidymodels">

Predictive Modeling with tidymodels

If you want production-grade modeling in R, use tidymodels: pipelines, cross-validation, recipes, metrics, and model tuning. Here is an end-to-end example predicting sideout success using pass score + serve zone.

R code: tidymodels xSO pipeline

library(tidymodels)

df <- rally_model_df %>%
  mutate(
    serve_zone = factor(serve_zone),
    receiving_team = factor(receiving_team)
  )

set.seed(2026)
split <- initial_split(df, prop = 0.8, strata = sideout_success)
train <- training(split)
test  <- testing(split)

rec <- recipe(sideout_success ~ pass_score + serve_zone, data = train) %>%
  step_impute_median(all_numeric_predictors()) %>%
  step_dummy(all_nominal_predictors())

model <- logistic_reg() %>%
  set_engine("glm")

wf <- workflow() %>%
  add_recipe(rec) %>%
  add_model(model)

fit <- wf %>% fit(data = train)

pred <- predict(fit, test, type = "prob") %>%
  bind_cols(test %>% select(sideout_success))

roc_auc(pred, truth = sideout_success, .pred_TRUE)
accuracy(predict(fit, test) %>% bind_cols(test), truth = sideout_success, estimate = .pred_class)

R code: add player random effects with mixed models (glmm)

# For player/team variation, you can use lme4 (not tidymodels-native).
install.packages("lme4")
library(lme4)

# Example: include receiving_team as a random intercept
xso_glmm <- glmer(
  sideout_success ~ pass_score + factor(serve_zone) + (1 | receiving_team),
  data = rally_model_df,
  family = binomial()
)

summary(xso_glmm)

< section id="bayes">

Bayesian Volleyball Analytics in R

Bayesian models are ideal when you want uncertainty, shrinkage, and better inference with small samples. In volleyball scouting, sample sizes can be tiny (a few matches), so Bayesian partial pooling is often a win.

R code: Bayesian xSO with brms

# Bayesian logistic regression with partial pooling by receiving team
install.packages("brms")
library(brms)

bayes_fit <- brm(
  sideout_success ~ pass_score + factor(serve_zone) + (1 | receiving_team),
  data = rally_model_df,
  family = bernoulli(),
  chains = 2, cores = 2, iter = 1500,
  seed = 2026
)

summary(bayes_fit)
posterior_summary(bayes_fit)

With brms, you can compute posterior distributions of SO% by team, compare strategies, and avoid overreacting to noise.

< section id="viz">

Visualization: ggplot2 Templates for Volleyball

Volleyball visualizations should be coach-friendly, quick to read, and tied to decisions: serve target, pass quality, rotation weaknesses, attack tendencies, and pressure points.

R code: SO% and BP% report chart

so_bp_wide <- so_bp %>%
  select(team, metric, pct) %>%
  pivot_wider(names_from = metric, values_from = pct)

so_bp_long <- so_bp %>%
  ggplot(aes(x = team, y = pct, fill = metric)) +
  geom_col(position = "dodge") +
  coord_flip() +
  labs(title = "Sideout % and Break Point % by Team", x = NULL, y = "Rate")

so_bp_long

R code: rotation heatmap (SO% by rotation)

rot_plot_df <- rotation_efficiency %>%
  mutate(receive_rotation = factor(receive_rotation, levels = 1:6))

ggplot(rot_plot_df, aes(x = receive_rotation, y = receiving_team, fill = so_pct)) +
  geom_tile() +
  labs(title = "Rotation Sideout Heatmap", x = "Rotation (Receiving)", y = "Team")

R code: fast HTML tables with gt

library(gt)

attack_metrics %>%
  filter(attempts >= 10) %>%
  arrange(desc(eff)) %>%
  gt() %>%
  fmt_percent(columns = c(kill_pct, error_pct, blocked_pct), decimals = 1) %>%
  fmt_number(columns = eff, decimals = 3) %>%
  tab_header(title = "Attack Leaderboard (Min 10 Attempts)")

< section id="shiny">

Dashboards: Shiny Scouting Reports

A Shiny scouting app can deliver instant insights for coaches: opponent serve targets, rotation weaknesses, attacker tendencies, and key matchups. Below is a compact Shiny template you can expand.

R code: minimal Shiny dashboard for team scouting

install.packages(c("shiny","bslib"))
library(shiny)
library(bslib)
library(tidyverse)

# Assume you already computed:
# - serve_pressure_summary
# - rotation_efficiency
# - attack_tendencies

ui <- page_sidebar(
  title = "Volleyball Analytics Dashboard (R + Shiny)",
  sidebar = sidebar(
    selectInput("team", "Select Team", choices = sort(unique(serve_pressure_summary$serving_team))),
    hr(),
    helpText("Key views: serve targets, rotation sideout, attack tendencies.")
  ),
  layout_columns(
    card(
      card_header("Serve Targets by Zone"),
      plotOutput("servePlot", height = 260)
    ),
    card(
      card_header("Rotation Sideout %"),
      plotOutput("rotPlot", height = 260)
    ),
    card(
      card_header("Top Attack Tendencies"),
      tableOutput("attackTable")
    )
  )
)

server <- function(input, output, session) {

  output$servePlot <- renderPlot({
    df <- serve_pressure_summary %>% filter(serving_team == input$team)
    ggplot(df, aes(x = factor(serve_end_zone), y = bp_rate)) +
      geom_col() +
      labs(x = "Serve End Zone", y = "Break Point Rate", title = paste("Serve Effectiveness -", input$team))
  })

  output$rotPlot <- renderPlot({
    df <- rotation_efficiency %>% filter(receiving_team == input$team) %>%
      mutate(receive_rotation = factor(receive_rotation, levels = 1:6))
    ggplot(df, aes(x = receive_rotation, y = so_pct)) +
      geom_col() +
      labs(x = "Rotation", y = "Sideout %", title = paste("Rotation Sideout -", input$team))
  })

  output$attackTable <- renderTable({
    attack_tendencies %>%
      filter(team == input$team) %>%
      group_by(player) %>%
      slice_max(order_by = pct, n = 5) %>%
      ungroup() %>%
      arrange(desc(pct)) %>%
      mutate(pct = round(pct * 100, 1))
  })
}

shinyApp(ui, server)

< section id="automation">

Automation: Reports to HTML/PDF + CI

One of the best uses of R in volleyball: automated weekly scouting reports. Generate: HTML match report, PDF coaching packet, and tables/figures for staff.

R code: Quarto report skeleton

# Create a Quarto (.qmd) file like reports/match_report.qmd
# Then render in R:
# quarto::quarto_render("reports/match_report.qmd")

# Example render call:
quarto::quarto_render(
  input = "reports/match_report.qmd",
  execute_params = list(match_id = "match_001")
)

Example Quarto front matter (paste into .qmd)

---
title: "Match Report"
format:
  html:
    toc: true
    code-fold: show
execute:
  echo: true
  warning: false
  message: false
params:
  match_id: "match_001"
---

< section id="best-practices">

Best Practices + Common Pitfalls

Define evaluation codes once and reuse them everywhere (serve/pass/attack mappings).
Keep raw data immutable in data/raw; write cleaned data to data/processed.
Separate scouting vs. performance analysis: scouting focuses on tendencies; performance focuses on efficiency.
Beware small samples (one match). Use Bayesian shrinkage or confidence intervals.
Rotation context matters: opponent rotations, server strength, and pass quality heavily confound results.
Don’t overfit: models should generalize across matches and opponents.
Make outputs coach-readable: simple tables, clear charts, and “so what?” conclusions.

R code: quick bootstrap CI for SO%

set.seed(2026)

bootstrap_ci <- function(x, B = 2000, conf = 0.95) {
  n <- length(x)
  boots <- replicate(B, mean(sample(x, n, replace = TRUE)))
  alpha <- (1 - conf) / 2
  quantile(boots, probs = c(alpha, 1 - alpha), na.rm = TRUE)
}

so_ci <- rallies %>%
  mutate(sideout_success = point_won_by == receiving_team) %>%
  group_by(receiving_team) %>%
  summarise(
    so = mean(sideout_success),
    ci_low = bootstrap_ci(sideout_success)[1],
    ci_high = bootstrap_ci(sideout_success)[2],
    n = n(),
    .groups = "drop"
  )

so_ci

< section id="recommended">

Recommended Book

If you want a structured, practical resource that goes deeper into volleyball analytics workflows, R code patterns, scouting/reporting, and modeling concepts, check out this book:

Volleyball Analytics with R (Recommended Book)

It’s a great companion if you’re building a complete R-based analytics stack for a club, federation, or collegiate program.

< section id="faq">

FAQ

What’s the best single metric in volleyball?

If you only track one KPI: Sideout %. It correlates strongly with winning because it reflects serve-receive stability and first-ball offense conversion.

How do I handle different coding systems?

Create a mapping layer (like eval_map) and convert raw labels into a standardized internal vocabulary. The rest of your pipeline should never depend on raw coding strings.

Can I do volleyball analytics without coordinates?

Yes. Zone-based analytics (1–6 or 1–9) plus pass quality and outcome are enough for rotation analysis, serve targeting, and basic predictive modeling.

What should I build first?

Start with: import + clean → SO% / BP% → pass + serve dashboards → rotation sideout → attack efficiency by pass quality. Once those are stable, add modeling.

< footer class="post-footer">

Tags: volleyball analytics with R, R volleyball stats, sideout percentage, rotation analysis, serve receive, scouting report, tidymodels, ggplot2, Shiny dashboard

The post Volleyball Analytics with R: The Complete Guide to Match Data, Sideout Efficiency, Serve Pressure, Heatmaps, and Predictive Models appeared first on R Programming Books.

To leave a comment for the author, please follow the link and comment on their blog: Blog - R Programming Books.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.