Football Betting Model in R (Step-by-Step Guide 2026)

rprogrammingbooks

3 hours ago

[This article was first published on Blog - R Programming Books, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

< !-- Macro Post (HTML) — paste into WordPress (Custom HTML block) --> < !-- NOTE: As requested, no tag and no on-page title/H1. –> <div> < !-- Quick internal links (link building) --> <div> <strong>Related (on this site):</strong> <ul style="margin:10px 0 0 18px;"> <li><a href="https://rprogrammingbooks.com/install-use-worldfootballr/" rel="nofollow" target="_blank">Install & Use worldfootballR</a></li> <li><a href="https://rprogrammingbooks.com/worldfootballr-guide/" rel="nofollow" target="_blank">worldfootballR Guide</a></li> <li><a href="https://rprogrammingbooks.com/sports-analytics-with-r/" rel="nofollow" target="_blank">Sports Analytics with R</a></li> <li><a href="https://rprogrammingbooks.com/nfl-analytics-with-r-nflfastr-nflverse/" rel="nofollow" target="_blank">NFL Analytics with R</a></li> <li><a href="https://rprogrammingbooks.com/tennis-analytics-with-r/" rel="nofollow" target="_blank">Tennis Analytics with R</a></li> <li><a href="https://rprogrammingbooks.com/boxing-analytics-with-r/" rel="nofollow" target="_blank">Boxing Analytics with R</a></li> <li><a href="https://rprogrammingbooks.com/product/bayesian-sports-analytics-r-predictive-modeling-betting-performance/" rel="nofollow" target="_blank">Bayesian Sports Analytics (Book/Product)</a></li> </ul> </div> < !-- Table of contents --> < nav style="padding:14px 16px;border:1px solid #e6e6e6;border-radius:12px;margin:0 0 22px 0;"> <strong>Contents</strong> <ol style="margin:10px 0 0 18px;"> <li><a href="https://rprogrammingbooks.com/football-betting-model-r-guide-2026/?utm_source=rss&utm_medium=rss&utm_campaign=football-betting-model-r-guide-2026#setup" rel="nofollow" target="_blank">Setup</a></li> <li><a href="https://rprogrammingbooks.com/football-betting-model-r-guide-2026/?utm_source=rss&utm_medium=rss&utm_campaign=football-betting-model-r-guide-2026#data" rel="nofollow" target="_blank">Get match data</a></li> <li><a href="https://rprogrammingbooks.com/football-betting-model-r-guide-2026/?utm_source=rss&utm_medium=rss&utm_campaign=football-betting-model-r-guide-2026#features" rel="nofollow" target="_blank">Feature engineering</a></li> <li><a href="https://rprogrammingbooks.com/football-betting-model-r-guide-2026/?utm_source=rss&utm_medium=rss&utm_campaign=football-betting-model-r-guide-2026#model1" rel="nofollow" target="_blank">Model 1: Poisson goals (baseline)</a></li> <li><a href="https://rprogrammingbooks.com/football-betting-model-r-guide-2026/?utm_source=rss&utm_medium=rss&utm_campaign=football-betting-model-r-guide-2026#model2" rel="nofollow" target="_blank">Model 2: Dixon–Coles adjustment (improves low scores)</a></li> <li><a href="https://rprogrammingbooks.com/football-betting-model-r-guide-2026/?utm_source=rss&utm_medium=rss&utm_campaign=football-betting-model-r-guide-2026#probs" rel="nofollow" target="_blank">From scorelines to 1X2 probabilities</a></li> <li><a href="https://rprogrammingbooks.com/football-betting-model-r-guide-2026/?utm_source=rss&utm_medium=rss&utm_campaign=football-betting-model-r-guide-2026#odds" rel="nofollow" target="_blank">Odds, implied probabilities & value</a></li> <li><a href="https://rprogrammingbooks.com/football-betting-model-r-guide-2026/?utm_source=rss&utm_medium=rss&utm_campaign=football-betting-model-r-guide-2026#backtest" rel="nofollow" target="_blank">Backtest: flat stake vs Kelly</a></li> <li><a href="https://rprogrammingbooks.com/football-betting-model-r-guide-2026/?utm_source=rss&utm_medium=rss&utm_campaign=football-betting-model-r-guide-2026#calibration" rel="nofollow" target="_blank">Calibration diagnostics</a></li> <li><a href="https://rprogrammingbooks.com/football-betting-model-r-guide-2026/?utm_source=rss&utm_medium=rss&utm_campaign=football-betting-model-r-guide-2026#production" rel="nofollow" target="_blank">Production: weekly pipeline</a></li> <li><a href="https://rprogrammingbooks.com/football-betting-model-r-guide-2026/?utm_source=rss&utm_medium=rss&utm_campaign=football-betting-model-r-guide-2026#faq" rel="nofollow" target="_blank">FAQ</a></li> </ol> </nav> < !-- SECTION: Setup --> < section id="setup"> <h2>Setup</h2> <p> This is a fully reproducible, code-heavy walkthrough for building a football betting model in R: data → features → model → probabilities → value bets → bankroll rules → backtest. If you’re new to <code>worldfootballR</code>, start here: <a href="https://rprogrammingbooks.com/install-use-worldfootballr/" rel="nofollow" target="_blank">Install & Use worldfootballR</a> and keep the <a href="https://rprogrammingbooks.com/worldfootballr-guide/" rel="nofollow" target="_blank">worldfootballR Guide</a> open as reference. </p> <pre># Core packages install.packages(c( "dplyr","tidyr","purrr","stringr","lubridate", "readr","ggplot2","tibble","janitor","glue" )) # Modeling + evaluation install.packages(c("broom","rsample","yardstick","scoringRules","pROC")) # Optional (for speed / nicer tables) install.packages(c("data.table","DT")) # Football data # worldfootballR is usually installed from GitHub # See: https://rprogrammingbooks.com/install-use-worldfootballr/ # If needed: # install.packages("remotes") # remotes::install_github("JaseZiv/worldfootballR") library(dplyr) library(tidyr) library(purrr) library(stringr) library(lubridate) library(readr) library(ggplot2) library(janitor) library(glue) # worldfootballR (uncomment after install) # library(worldfootballR) set.seed(2026)</pre> <div> <strong>Modeling note:</strong> The baseline approach below is the classic Poisson goals model (team attack/defense + home advantage). It’s simple, explainable, and a great foundation. You can extend it later to xG models, Bayesian hierarchical models, or time-varying strength. If you like Bayesian approaches, see: <a href="https://rprogrammingbooks.com/product/bayesian-sports-analytics-r-predictive-modeling-betting-performance/" rel="nofollow" target="_blank"> Bayesian Sports Analytics (Book/Product) </a>. </div> </section> < !-- SECTION: Data --> < section id="data"> <h2>Get match data</h2> <p> The easiest path is to pull historical match results from public sources. With <code>worldfootballR</code>, you can often scrape league seasons from sources like FBref. The specific function names can vary by package version/source, so below you’ll see: </p> <ul> <li><strong>Option A:</strong> Use <code>worldfootballR</code> directly (recommended).</li> <li><strong>Option B:</strong> Use your own CSV export if you already have data.</li> </ul> <h3>Option A — Pull league season data with worldfootballR</h3> <p> If you need help with installation and authentication quirks, see: <a href="https://rprogrammingbooks.com/install-use-worldfootballr/" rel="nofollow" target="_blank">Install & Use worldfootballR</a>. </p> <pre># --- Option A (worldfootballR) --- # The exact workflow depends on the data source (FBref / other). # Typical pattern: # 1) Get competition URLs # 2) Pull match results for seasons # # PSEUDOCODE (adjust per your worldfootballR version): # # comp_urls <- fb_league_urls(country = "ENG", gender = "M", season_end_year = 2026) # epl_url <- comp_urls %>% filter(str_detect(Competition_Name, "Premier League")) %>% pull(Seasons_Urls) %>% .[1] # # matches <- fb_match_results(season_url = epl_url) %>% # janitor::clean_names() # # head(matches)</pre> <h3>Option B — Use a CSV export (works everywhere)</h3> <p> Your data needs (minimum): <code>date</code>, <code>home_team</code>, <code>away_team</code>, <code>home_goals</code>, <code>away_goals</code>. Save it as <code>matches.csv</code>. </p> <pre># --- Option B (CSV) --- matches <- readr::read_csv("matches.csv") %>% janitor::clean_names() %>% mutate(date = as.Date(date)) %>% filter(!is.na(home_goals), !is.na(away_goals)) %>% arrange(date) dplyr::glimpse(matches)</pre> <h3>Standardize columns</h3> <pre># Make sure we have standardized column names matches <- matches %>% transmute( date = as.Date(date), season = if_else(month(date) >= 7, year(date) + 1L, year(date)), # football season heuristic home = as.character(home_team), away = as.character(away_team), hg = as.integer(home_goals), ag = as.integer(away_goals) ) %>% filter(!is.na(date), !is.na(home), !is.na(away), !is.na(hg), !is.na(ag)) %>% arrange(date) # Basic sanity checks stopifnot(all(matches$hg >= 0), all(matches$ag >= 0)) matches %>% count(season) %>% arrange(desc(season)) %>% print(n = 50)</pre> </section> < !-- SECTION: Features --> < section id="features"> <h2>Feature engineering</h2> <p> For a baseline Poisson model, we’ll estimate team strength through parameters: <em>attack</em> and <em>defense</em>, plus a <em>home advantage</em>. We’ll also build rolling form features as optional enhancements. </p> <h3>Long format for modeling</h3> <pre>long <- matches %>% mutate(match_id = row_number()) %>% tidyr::pivot_longer( cols = c(home, away), names_to = "side", values_to = "team" ) %>% mutate( opp = if_else(side == "home", away, home), goals = if_else(side == "home", hg, ag), conceded = if_else(side == "home", ag, hg), is_home = as.integer(side == "home") ) %>% select(match_id, date, season, team, opp, is_home, goals, conceded) head(long)</pre> <h3>Optional: rolling “form” features</h3> <p> These can help a bit, but don’t overfit. Keep them simple and always validate out-of-sample. </p> <pre># Rolling averages for goals scored/conceded over last N matches (per team) add_form_features <- function(df, n = 5) { df %>% arrange(team, date, match_id) %>% group_by(team) %>% mutate( gf_roll = zoo::rollapplyr(goals, width = n, FUN = mean, fill = NA, partial = TRUE), ga_roll = zoo::rollapplyr(conceded, width = n, FUN = mean, fill = NA, partial = TRUE) ) %>% ungroup() } # install.packages("zoo") if needed # library(zoo) # long <- add_form_features(long, n = 5)</pre> <div> <strong>Link building tip:</strong> If you cover multiple sports, create a “methods hub” page and link out to each sport’s analytics guide: <a href="https://rprogrammingbooks.com/sports-analytics-with-r/" rel="nofollow" target="_blank">Sports Analytics with R</a>, <a href="https://rprogrammingbooks.com/nfl-analytics-with-r-nflfastr-nflverse/" rel="nofollow" target="_blank">NFL</a>, <a href="https://rprogrammingbooks.com/tennis-analytics-with-r/" rel="nofollow" target="_blank">Tennis</a>, <a href="https://rprogrammingbooks.com/boxing-analytics-with-r/" rel="nofollow" target="_blank">Boxing</a>. This strengthens topical authority and internal PageRank flow. </div> </section> < !-- SECTION: Model 1 --> < section id="model1"> <h2>Model 1: Poisson goals (baseline)</h2> <p> We’ll fit two Poisson regressions: one for home goals and one for away goals, with team attack/defense effects and a home advantage term. A standard approach is: </p> <ul> <li><code>home_goals ~ home_adv + attack(home) + defense(away)</code></li> <li><code>away_goals ~ attack(away) + defense(home)</code></li> </ul> <p> To avoid identifiability issues, we’ll set a baseline team as reference via factor levels. </p> <h3>Train/test split by time (realistic for betting)</h3> <pre># Time-based split (e.g., last 20% of matches as test) n_total <- nrow(matches) cut_idx <- floor(n_total * 0.80) train <- matches %>% slice(1:cut_idx) test <- matches %>% slice((cut_idx + 1):n_total) # Ensure consistent factor levels teams <- sort(unique(c(matches$home, matches$away))) train <- train %>% mutate(home = factor(home, levels = teams), away = factor(away, levels = teams)) test <- test %>% mutate(home = factor(home, levels = teams), away = factor(away, levels = teams)) # Fit models home_mod <- glm(hg ~ 1 + home + away, data = train, family = poisson()) away_mod <- glm(ag ~ 1 + away + home, data = train, family = poisson()) summary(home_mod) summary(away_mod)</pre> <div> <strong>Interpretation:</strong> The simplest version above uses team fixed effects as factors. It works, but it mixes attack/defense. Next, we’ll fit the more interpretable attack/defense parameterization. </div> <h3>Attack/Defense parameterization (more interpretable)</h3> <pre># Build a modeling frame in the classic attack/defense form # We model home goals: # log(lambda_home) = home_adv + attack_home - defense_away # And away goals: # log(lambda_away) = attack_away - defense_home # Create team factors train2 <- train %>% mutate( home = factor(home, levels = teams), away = factor(away, levels = teams) ) # We'll encode attack and defense as separate factors by prefixing labels mk_attack <- function(team) factor(paste0("att_", team), levels = paste0("att_", teams)) mk_def <- function(team) factor(paste0("def_", team), levels = paste0("def_", teams)) train_home <- train2 %>% transmute( goals = hg, is_home = 1L, att = mk_attack(home), def = mk_def(away) ) train_away <- train2 %>% transmute( goals = ag, is_home = 0L, att = mk_attack(away), def = mk_def(home) ) train_long <- bind_rows(train_home, train_away) # Fit a single Poisson model with: # goals ~ is_home + att + def # Note: to reflect "- defense" we can include def and allow coefficients to learn direction; # For stricter structure you can re-code defense sign, but this works well in practice. ad_mod <- glm(goals ~ is_home + att + def, data = train_long, family = poisson()) summary(ad_mod)</pre> <h3>Predict expected goals (lambda) for each match</h3> <pre>predict_lambdas <- function(df, model, teams) { df2 <- df %>% mutate( home = factor(home, levels = teams), away = factor(away, levels = teams), att_home = factor(paste0("att_", home), levels = paste0("att_", teams)), def_away = factor(paste0("def_", away), levels = paste0("def_", teams)), att_away = factor(paste0("att_", away), levels = paste0("att_", teams)), def_home = factor(paste0("def_", home), levels = paste0("def_", teams)) ) # home lambda new_home <- df2 %>% transmute(is_home = 1L, att = att_home, def = def_away) # away lambda new_away <- df2 %>% transmute(is_home = 0L, att = att_away, def = def_home) lam_home <- predict(model, newdata = new_home, type = "response") lam_away <- predict(model, newdata = new_away, type = "response") df2 %>% mutate(lambda_home = lam_home, lambda_away = lam_away) } test_pred <- predict_lambdas(test, ad_mod, teams) head(test_pred)</pre> </section> < !-- SECTION: Model 2 --> < section id="model2"> <h2>Model 2: Dixon–Coles adjustment (improves low scores)</h2> <p> The independent Poisson assumption can under/overestimate probabilities for low scores (0–0, 1–0, 0–1, 1–1). Dixon–Coles introduces a small correction factor. Below is a clean implementation. </p> <pre># Dixon-Coles tau adjustment for low-score dependence tau_dc <- function(x, y, lam_x, lam_y, rho) { # x = home goals, y = away goals # rho is the dependence parameter if (x == 0 && y == 0) return(1 - (lam_x * lam_y * rho)) if (x == 0 && y == 1) return(1 + (lam_x * rho)) if (x == 1 && y == 0) return(1 + (lam_y * rho)) if (x == 1 && y == 1) return(1 - rho) return(1) } # Scoreline probability matrix up to max_goals score_matrix <- function(lam_h, lam_a, rho = 0, max_goals = 10) { xs <- 0:max_goals ys <- 0:max_goals ph <- dpois(xs, lam_h) pa <- dpois(ys, lam_a) # outer product for independent probabilities P <- outer(ph, pa) # apply DC tau correction for (i in seq_along(xs)) { for (j in seq_along(ys)) { P[i, j] <- P[i, j] * tau_dc(xs[i], ys[j], lam_h, lam_a, rho) } } # renormalize P / sum(P) } # Example P_ex <- score_matrix(lam_h = 1.4, lam_a = 1.1, rho = 0.05, max_goals = 8) round(P_ex[1:5,1:5], 4)</pre> <p> How do we choose <code>rho</code>? You can estimate it by maximizing likelihood on training data. Here’s a lightweight optimizer: </p> <pre># Estimate rho by maximizing log-likelihood on train set given lambdas train_pred <- predict_lambdas(train, ad_mod, teams) dc_loglik <- function(rho, df, max_goals = 10) { # clamp rho to a reasonable range to avoid numerical issues rho <- max(min(rho, 0.3), -0.3) ll <- 0 for (k in seq_len(nrow(df))) { lam_h <- df$lambda_home[k] lam_a <- df$lambda_away[k] hg <- df$hg[k] ag <- df$ag[k] P <- score_matrix(lam_h, lam_a, rho = rho, max_goals = max_goals) # if score exceeds max_goals, treat as tiny prob (or increase max_goals) if (hg > max_goals || ag > max_goals) { ll <- ll + log(1e-12) } else { ll <- ll + log(P[hg + 1, ag + 1] + 1e-15) } } ll } opt <- optimize( f = function(r) -dc_loglik(r, train_pred, max_goals = 10), interval = c(-0.2, 0.2) ) rho_hat <- opt$minimum rho_hat</pre> </section> < !-- SECTION: probs --> < section id="probs"> <h2>From scorelines to 1X2 probabilities</h2> <p> Once you have a scoreline probability matrix <code>P</code>, compute: </p> <ul> <li><strong>Home win:</strong> sum of probabilities where home goals > away goals</li> <li><strong>Draw:</strong> sum of diagonal</li> <li><strong>Away win:</strong> sum where home goals < away goals</li> </ul> <pre>p1x2_from_matrix <- function(P) { max_g <- nrow(P) - 1 xs <- 0:max_g ys <- 0:max_g p_home <- 0 p_draw <- 0 p_away <- 0 for (i in seq_along(xs)) { for (j in seq_along(ys)) { if (xs[i] > ys[j]) p_home <- p_home + P[i, j] if (xs[i] == ys[j]) p_draw <- p_draw + P[i, j] if (xs[i] < ys[j]) p_away <- p_away + P[i, j] } } tibble(p_home = p_home, p_draw = p_draw, p_away = p_away) } predict_1x2 <- function(df, rho = 0, max_goals = 10) { out <- vector("list", nrow(df)) for (k in seq_len(nrow(df))) { P <- score_matrix(df$lambda_home[k], df$lambda_away[k], rho = rho, max_goals = max_goals) out[[k]] <- p1x2_from_matrix(P) } bind_rows(out) } test_1x2 <- bind_cols( test_pred, predict_1x2(test_pred, rho = rho_hat, max_goals = 10) ) test_1x2 %>% select(date, home, away, hg, ag, lambda_home, lambda_away, p_home, p_draw, p_away) %>% head(10)</pre> </section> < !-- SECTION: odds --> < section id="odds"> <h2>Odds, implied probabilities & value</h2> <p> Betting decisions should be driven by <strong>expected value (EV)</strong>. If the market offers decimal odds <code>O</code> and your model probability is <code>p</code>, then: </p> <ul> <li><strong>Expected value per unit stake:</strong> <code>EV = p*(O-1) - (1-p)</code></li> <li><strong>Value condition:</strong> <code>p > 1/O</code></li> </ul> <p> You’ll typically have an odds feed. Below we assume you have a file <code>odds.csv</code>: <code>date, home, away, odds_home, odds_draw, odds_away</code>. </p> <pre>odds <- readr::read_csv("odds.csv") %>% janitor::clean_names() %>% mutate(date = as.Date(date)) %>% transmute( date, home = as.character(home), away = as.character(away), o_home = as.numeric(odds_home), o_draw = as.numeric(odds_draw), o_away = as.numeric(odds_away) ) df <- test_1x2 %>% mutate(home = as.character(home), away = as.character(away)) %>% left_join(odds, by = c("date","home","away")) # Implied probs (no vig removal yet) df <- df %>% mutate( imp_home = 1 / o_home, imp_draw = 1 / o_draw, imp_away = 1 / o_away, overround = imp_home + imp_draw + imp_away ) # Simple vig removal by normalization df <- df %>% mutate( mkt_home = imp_home / overround, mkt_draw = imp_draw / overround, mkt_away = imp_away / overround ) # EV per 1 unit stake ev <- function(p, o) p*(o - 1) - (1 - p) df <- df %>% mutate( ev_home = ev(p_home, o_home), ev_draw = ev(p_draw, o_draw), ev_away = ev(p_away, o_away) ) df %>% select(date, home, away, p_home, p_draw, p_away, o_home, o_draw, o_away, ev_home, ev_draw, ev_away) %>% head(10)</pre> <h3>Pick bets with thresholds (avoid noise)</h3> <pre># Practical filters: require at least some edge and avoid tiny probabilities EDGE_MIN <- 0.02 # 2% EV edge P_MIN <- 0.05 # avoid extreme longshots unless you model them well df_bets <- df %>% mutate( pick = case_when( ev_home == pmax(ev_home, ev_draw, ev_away, na.rm = TRUE) ~ "H", ev_draw == pmax(ev_home, ev_draw, ev_away, na.rm = TRUE) ~ "D", TRUE ~ "A" ), p_pick = case_when(pick == "H" ~ p_home, pick == "D" ~ p_draw, TRUE ~ p_away), o_pick = case_when(pick == "H" ~ o_home, pick == "D" ~ o_draw, TRUE ~ o_away), ev_pick = case_when(pick == "H" ~ ev_home, pick == "D" ~ ev_draw, TRUE ~ ev_away) ) %>% filter(!is.na(o_pick)) %>% filter(p_pick >= P_MIN, ev_pick >= EDGE_MIN) df_bets %>% count(pick)</pre> </section> < !-- SECTION: Backtest --> < section id="backtest"> <h2>Backtest: flat stake vs Kelly</h2> <p> Backtesting is where most people fool themselves. Use a time-based split, realistic bet selection rules, and conservative bankroll sizing. </p> <h3>Compute bet results</h3> <pre># Outcome label df_bets <- df_bets %>% mutate( result = case_when( hg > ag ~ "H", hg == ag ~ "D", TRUE ~ "A" ), win = as.integer(pick == result) ) # Profit per 1 unit stake df_bets <- df_bets %>% mutate( profit_flat = if_else(win == 1, o_pick - 1, -1) ) df_bets %>% summarise( n_bets = n(), hit_rate = mean(win), avg_odds = mean(o_pick), roi_flat = mean(profit_flat) )</pre> <h3>Kelly staking (fractional Kelly recommended)</h3> <p> Full Kelly is often too aggressive. Use fractional Kelly (e.g., 0.25 Kelly). If you want a deeper treatment of Kelly and Bayesian uncertainty, see: <a href="https://rprogrammingbooks.com/product/bayesian-sports-analytics-r-predictive-modeling-betting-performance/" rel="nofollow" target="_blank"> Bayesian Sports Analytics (Book/Product) </a>. </p> <pre>kelly_fraction <- function(p, o) { # Decimal odds o. Net odds b = o - 1 b <- o - 1 q <- 1 - p f <- (b*p - q) / b pmax(0, f) } KELLY_MULT <- 0.25 # fractional Kelly df_bets <- df_bets %>% mutate( f_kelly = kelly_fraction(p_pick, o_pick), stake_kelly = KELLY_MULT * f_kelly ) # Simulate bankroll simulate_bankroll <- function(df, start_bankroll = 100, stake_col = "stake_kelly") { br <- start_bankroll path <- numeric(nrow(df)) for (i in seq_len(nrow(df))) { stake <- df[[stake_col]][i] stake_amt <- br * stake # Profit = stake_amt*(o-1) if win else -stake_amt prof <- if (df$win[i] == 1) stake_amt*(df$o_pick[i] - 1) else -stake_amt br <- br + prof path[i] <- br } path } df_bets <- df_bets %>% arrange(date) # Flat staking: e.g., 1% bankroll per bet df_bets <- df_bets %>% mutate(stake_flat = 0.01) df_bets$br_flat <- simulate_bankroll(df_bets, start_bankroll = 100, stake_col = "stake_flat") df_bets$br_kelly <- simulate_bankroll(df_bets, start_bankroll = 100, stake_col = "stake_kelly") tail(df_bets %>% select(date, home, away, pick, o_pick, win, br_flat, br_kelly), 10)</pre> <h3>Plot bankroll curves</h3> <pre>plot_df <- df_bets %>% select(date, br_flat, br_kelly) %>% pivot_longer(cols = c(br_flat, br_kelly), names_to = "strategy", values_to = "bankroll") ggplot(plot_df, aes(x = date, y = bankroll, group = strategy)) + geom_line() + labs(x = NULL, y = "Bankroll", title = "Backtest Bankroll: Flat vs Fractional Kelly")</pre> </section> < !-- SECTION: Calibration --> < section id="calibration"> <h2>Calibration diagnostics</h2> <p> Profit is noisy. Calibration tells you if probabilities are sensible. A great proper scoring rule for 1X2 is the multi-class log loss. We’ll compute log loss for your 1X2 probabilities on the test set. </p> <pre># Create a probability matrix and truth labels df_eval <- df %>% filter(!is.na(p_home), !is.na(p_draw), !is.na(p_away)) %>% mutate( truth = case_when(hg > ag ~ "H", hg == ag ~ "D", TRUE ~ "A"), truth = factor(truth, levels = c("H","D","A")) ) # Log loss (manual) log_loss_1x2 <- function(pH, pD, pA, y) { eps <- 1e-15 p <- ifelse(y=="H", pH, ifelse(y=="D", pD, pA)) -mean(log(pmax(p, eps))) } ll <- log_loss_1x2(df_eval$p_home, df_eval$p_draw, df_eval$p_away, df_eval$truth) ll</pre> <h3>Reliability plot (binning)</h3> <pre># Example: calibration for HOME-win probability calib_home <- df_eval %>% mutate(bin = ntile(p_home, 10)) %>% group_by(bin) %>% summarise( p_mean = mean(p_home), freq = mean(truth == "H"), n = n(), .groups = "drop" ) ggplot(calib_home, aes(x = p_mean, y = freq)) + geom_point() + geom_abline(slope = 1, intercept = 0) + labs(x = "Predicted P(Home win)", y = "Observed frequency", title = "Calibration (Home win)")</pre> <div> <strong>Common upgrade path:</strong> Add xG features (if you have them) and/or time decay (recent matches matter more). Then validate calibration again. Don’t chase ROI without checking probability quality. </div> </section> < !-- SECTION: Production --> < section id="production"> <h2>Production: weekly pipeline</h2> <p> Here’s a practical skeleton you can run weekly: fetch latest matches → refit/refresh → generate next fixtures probabilities → compare with odds. </p> <pre># 1) Load historical matches (from worldfootballR or your CSV) matches <- readr::read_csv("matches.csv") %>% janitor::clean_names() # 2) Train up to cutoff date (e.g., yesterday) cutoff_date <- Sys.Date() - 1 hist <- matches %>% mutate(date = as.Date(date)) %>% filter(date <= cutoff_date) %>% transmute(date, home = home_team, away = away_team, hg = home_goals, ag = away_goals) %>% arrange(date) teams <- sort(unique(c(hist$home, hist$away))) # 3) Fit attack/defense model train_long <- bind_rows( hist %>% transmute(goals = hg, is_home = 1L, att = factor(paste0("att_", home), levels = paste0("att_", teams)), def = factor(paste0("def_", away), levels = paste0("def_", teams))), hist %>% transmute(goals = ag, is_home = 0L, att = factor(paste0("att_", away), levels = paste0("att_", teams)), def = factor(paste0("def_", home), levels = paste0("def_", teams))) ) ad_mod <- glm(goals ~ is_home + att + def, data = train_long, family = poisson()) # 4) Predict for upcoming fixtures (you need a fixtures table) fixtures <- readr::read_csv("fixtures.csv") %>% janitor::clean_names() %>% mutate(date = as.Date(date)) %>% transmute(date, home = home_team, away = away_team) # Compute lambdas fixtures2 <- fixtures %>% mutate( home = factor(home, levels = teams), away = factor(away, levels = teams) ) fixtures_pred <- predict_lambdas( df = fixtures2 %>% mutate(hg = 0L, ag = 0L), # placeholders model = ad_mod, teams = teams ) # 5) Estimate rho (optional) on recent history only (faster) hist_pred <- hist %>% mutate(home = factor(home, levels=teams), away=factor(away, levels=teams)) hist_pred <- predict_lambdas(hist_pred, ad_mod, teams) opt <- optimize( f = function(r) -dc_loglik(r, hist_pred %>% mutate(lambda_home=lambda_home, lambda_away=lambda_away), max_goals = 10), interval = c(-0.2, 0.2) ) rho_hat <- opt$minimum # 6) Convert to 1X2 fixtures_1x2 <- bind_cols( fixtures_pred, predict_1x2(fixtures_pred, rho = rho_hat, max_goals = 10) ) %>% select(date, home, away, lambda_home, lambda_away, p_home, p_draw, p_away) write_csv(fixtures_1x2, "model_probs.csv")</pre> <p> At this point you have <code>model_probs.csv</code> ready to merge with bookmaker odds and produce a bet shortlist. In your content strategy, link this post to your broader methods pages: <a href="https://rprogrammingbooks.com/sports-analytics-with-r/" rel="nofollow" target="_blank">Sports Analytics with R</a> and the relevant sport hubs (NFL, tennis, boxing) to strengthen internal linking. </p> </section> < !-- FAQ --> < section id="faq"> <h2>FAQ</h2> <h3>Is a Poisson model “good enough” for football betting?</h3> <p> It’s a strong baseline. It captures team strength and home advantage with minimal complexity. Many upgrades (xG, time decay, Bayesian partial pooling) improve robustness, but the baseline can already be useful. </p> <h3>How do I avoid overfitting?</h3> <p> Use time-based validation, keep features simple, and prioritize calibration and log loss. Don’t tune thresholds using the same data you evaluate on. </p> <h3>What’s the simplest value-bet rule?</h3> <p> Bet only when <code>p_model > p_implied</code> and you have a buffer (e.g. EV > 2%), then stake conservatively (flat 0.5%–1% bankroll or fractional Kelly). </p> <h3>Where do I learn more advanced Bayesian sports models in R?</h3> <p> If you want Bayesian approaches, uncertainty-aware staking, and a deeper treatment of the Kelly criterion, see: <a href="https://rprogrammingbooks.com/product/bayesian-sports-analytics-r-predictive-modeling-betting-performance/" rel="nofollow" target="_blank"> Bayesian Sports Analytics (Book/Product) </a>. </p> </section> < !-- Optional: FAQ Schema (JSON-LD) --> <script type="application/ld+json"> { "@context": "https://schema.org", "@type": "FAQPage", "mainEntity": [ { "@type": "Question", "name": "Is a Poisson model good enough for football betting?", "acceptedAnswer": { "@type": "Answer", "text": "It’s a strong baseline that captures team strength and home advantage with minimal complexity. Many upgrades like xG, time decay, and Bayesian partial pooling can improve robustness, but the baseline can already be useful when validated properly." } }, { "@type": "Question", "name": "How do I avoid overfitting a football betting model in R?", "acceptedAnswer": { "@type": "Answer", "text": "Use time-based validation, keep features simple, evaluate with calibration and log loss, and avoid tuning thresholds on the same data used for evaluation." } }, { "@type": "Question", "name": "What is the simplest value-bet rule?", "acceptedAnswer": { "@type": "Answer", "text": "Bet only when model probability exceeds implied probability (with a safety margin, e.g. EV > 2%), and stake conservatively using flat staking or fractional Kelly." } } ] } </script> <hr></hr> < !-- Closing internal links (more link building) --> <div> <strong>Next reads on rprogrammingbooks.com</strong> <ul style="margin:10px 0 0 18px;"> <li><a href="https://rprogrammingbooks.com/install-use-worldfootballr/" rel="nofollow" target="_blank">Install & Use worldfootballR</a> (setup + troubleshooting)</li> <li><a href="https://rprogrammingbooks.com/worldfootballr-guide/" rel="nofollow" target="_blank">worldfootballR Guide</a> (scraping + workflows)</li> <li><a href="https://rprogrammingbooks.com/sports-analytics-with-r/" rel="nofollow" target="_blank">Sports Analytics with R</a> (methods hub)</li> <li><a href="https://rprogrammingbooks.com/product/bayesian-sports-analytics-r-predictive-modeling-betting-performance/" rel="nofollow" target="_blank">Bayesian Sports Analytics</a> (advanced modeling + Kelly)</li> </ul> </div> </div> <p></p> <p>The post <a href="https://rprogrammingbooks.com/football-betting-model-r-guide-2026/" rel="nofollow" target="_blank">Football Betting Model in R (Step-by-Step Guide 2026)</a> appeared first on <a href="https://rprogrammingbooks.com/" rel="nofollow" target="_blank">R Programming Books</a>.</p> <div id='jp-relatedposts' class='jp-relatedposts' > <h3 class="jp-relatedposts-headline"><em>Related</em></h3> </div><aside class="mashsb-container mashsb-main mashsb-stretched"><div class="mashsb-box"><div class="mashsb-buttons"><a class="mashicon-facebook mash-large mash-center mashsb-noshadow" href="https://www.facebook.com/sharer.php?u=https%3A%2F%2Fwww.r-bloggers.com%2F2026%2F02%2Ffootball-betting-model-in-r-step-by-step-guide-2026%2F" target="_blank" rel="nofollow"><span class="icon"></span><span class="text">Share</span></a><a class="mashicon-twitter mash-large mash-center mashsb-noshadow" href="https://twitter.com/intent/tweet?text=Football%20Betting%20Model%20in%20R%20%28Step-by-Step%20Guide%202026%29&url=https://www.r-bloggers.com/2026/02/football-betting-model-in-r-step-by-step-guide-2026/&via=Rbloggers" target="_blank" rel="nofollow"><span class="icon"></span><span class="text">Tweet</span></a><div class="onoffswitch2 mash-large mashsb-noshadow" style="display:none"></div></div> </div> <div style="clear:both"></div></aside>  <div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;"> <div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="https://rprogrammingbooks.com/football-betting-model-r-guide-2026/?utm_source=rss&utm_medium=rss&utm_campaign=football-betting-model-r-guide-2026"> Blog - R Programming Books</a></strong>.</div> <hr></hr> <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>. <hr></hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't. </div> </div> <footer class="amp-wp-article-footer"> <div class="amp-wp-meta amp-wp-tax-category"> Categories: <a href="https://www.r-bloggers.com/category/r-bloggers/" rel="category tag">R bloggers</a> </div> </footer> </article> <footer class="amp-wp-footer"> <div> <h2>R-bloggers</h2> <a href="#top" class="back-to-top">Back to top</a> </div> </footer> <script type="application/ld+json" class="saswp-schema-markup-output"> [{"@context":"https://schema.org/","@graph":[{"@type":"Organization","@id":"https://www.r-bloggers.com#Organization","name":"R-bloggers","url":"https://www.r-bloggers.com","sameAs":[],"logo":{"@type":"ImageObject","url":"http://www.r-bloggers.com/wp-content/uploads/2021/05/R_blogger_logo1_large.png","width":"1285","height":"369"},"contactPoint":{"@type":"ContactPoint","contactType":"technical support","telephone":"","url":"https://www.r-bloggers.com/contact-us/"}},{"@type":"WebSite","@id":"https://www.r-bloggers.com#website","headline":"R-bloggers","name":"R-bloggers","description":"R news and tutorials contributed by hundreds of R bloggers","url":"https://www.r-bloggers.com","potentialAction":{"@type":"SearchAction","target":"https://www.r-bloggers.com?s={search_term_string}","query-input":"required name=search_term_string"},"publisher":{"@id":"https://www.r-bloggers.com#Organization"}},{"@context":"https://schema.org/","@type":"WebPage","@id":"https://www.r-bloggers.com/2026/02/football-betting-model-in-r-step-by-step-guide-2026/#webpage","name":"Football Betting Model in R (Step-by-Step Guide 2026)","url":"https://www.r-bloggers.com/2026/02/football-betting-model-in-r-step-by-step-guide-2026/","lastReviewed":"2026-02-22T16:06:52-06:00","dateCreated":"2026-02-22T16:06:52-06:00","inLanguage":"en-US","description":"Related (on this site): Install & Use worldfootballR worldfootballR Guide Sports Analytics with R NFL Analytics with R Tennis Analytics with R Boxing Analytics with R Bayesian Sports Analytics (Book/Product) Contents Setup Get match data Feature engineering Model 1: Poisson goals (baseline) Model 2: Dixon–Coles adjustment (improves low scores) From scorelines to 1X2 probabilities Odds, The post Football Betting Model in R (Step-by-Step Guide 2026) appeared first on R Programming Books.","reviewedBy":{"@type":"Organization","name":"R-bloggers","url":"https://www.r-bloggers.com","logo":{"@type":"ImageObject","url":"http://www.r-bloggers.com/wp-content/uploads/2021/05/R_blogger_logo1_large.png","width":"1285","height":"369"}},"primaryImageOfPage":{"@id":"https://www.r-bloggers.com/2026/02/football-betting-model-in-r-step-by-step-guide-2026/#primaryimage"},"mainContentOfPage":[[{"@context":"https://schema.org/","@type":"SiteNavigationElement","@id":"https://www.r-bloggers.com#top nav","name":"Home","url":"https://www.r-bloggers.com"},{"@context":"https://schema.org/","@type":"SiteNavigationElement","@id":"https://www.r-bloggers.com#top nav","name":"About","url":"http://www.r-bloggers.com/about/"},{"@context":"https://schema.org/","@type":"SiteNavigationElement","@id":"https://www.r-bloggers.com#top nav","name":"RSS","url":"https://feeds.feedburner.com/RBloggers"},{"@context":"https://schema.org/","@type":"SiteNavigationElement","@id":"https://www.r-bloggers.com#top nav","name":"add your blog!","url":"http://www.r-bloggers.com/add-your-blog/"},{"@context":"https://schema.org/","@type":"SiteNavigationElement","@id":"https://www.r-bloggers.com#top nav","name":"Learn R","url":"https://www.r-bloggers.com/2015/12/how-to-learn-r-2/"},{"@context":"https://schema.org/","@type":"SiteNavigationElement","@id":"https://www.r-bloggers.com#top nav","name":"R jobs","url":"https://www.r-users.com/"},{"@context":"https://schema.org/","@type":"SiteNavigationElement","@id":"https://www.r-bloggers.com#top nav","name":"Submit a new job (it's free)","url":"https://www.r-users.com/submit-job/"},{"@context":"https://schema.org/","@type":"SiteNavigationElement","@id":"https://www.r-bloggers.com#top nav","name":"Browse latest jobs (also free)","url":"https://www.r-users.com/"},{"@context":"https://schema.org/","@type":"SiteNavigationElement","@id":"https://www.r-bloggers.com#top nav","name":"Contact us","url":"http://www.r-bloggers.com/contact-us/"}]],"isPartOf":{"@id":"https://www.r-bloggers.com#website"},"breadcrumb":{"@id":"https://www.r-bloggers.com/2026/02/football-betting-model-in-r-step-by-step-guide-2026/#breadcrumb"}},{"@type":"BreadcrumbList","@id":"https://www.r-bloggers.com/2026/02/football-betting-model-in-r-step-by-step-guide-2026/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"item":{"@id":"https://www.r-bloggers.com","name":"R-bloggers"}},{"@type":"ListItem","position":2,"item":{"@id":"https://www.r-bloggers.com/category/r-bloggers/","name":"R bloggers"}},{"@type":"ListItem","position":3,"item":{"@id":"https://www.r-bloggers.com/2026/02/football-betting-model-in-r-step-by-step-guide-2026/","name":"Football Betting Model in R (Step-by-Step Guide 2026)"}}]},{"@type":"Article","@id":"https://www.r-bloggers.com/2026/02/football-betting-model-in-r-step-by-step-guide-2026/#Article","url":"https://www.r-bloggers.com/2026/02/football-betting-model-in-r-step-by-step-guide-2026/","inLanguage":"en-US","mainEntityOfPage":"https://www.r-bloggers.com/2026/02/football-betting-model-in-r-step-by-step-guide-2026/#webpage","headline":"Football Betting Model in R (Step-by-Step Guide 2026)","description":"Related (on this site): Install & Use worldfootballR worldfootballR Guide Sports Analytics with R NFL Analytics with R Tennis Analytics with R Boxing Analytics with R Bayesian Sports Analytics (Book/Product) Contents Setup Get match data Feature engineering Model 1: Poisson goals (baseline) Model 2: Dixon–Coles adjustment (improves low scores) From scorelines to 1X2 probabilities Odds, The post Football Betting Model in R (Step-by-Step Guide 2026) appeared first on R Programming Books.","articleBody":"Related (on this site): Install & Use worldfootballR worldfootballR Guide Sports Analytics with R NFL Analytics with R Tennis Analytics with R Boxing Analytics with R Bayesian Sports Analytics (Book/Product) Contents Setup Get match data Feature engineering Model 1: Poisson goals (baseline) Model 2: Dixon–Coles adjustment (improves low scores) From scorelines to 1X2 probabilities Odds, implied probabilities & value Backtest: flat stake vs Kelly Calibration diagnostics Production: weekly pipeline FAQ Setup This is a fully reproducible, code-heavy walkthrough for building a football betting model in R: data → features → model → probabilities → value bets → bankroll rules → backtest. If you’re new to worldfootballR, start here: Install & Use worldfootballR and keep the worldfootballR Guide open as reference. # Core packages install.packages(c( \"dplyr\",\"tidyr\",\"purrr\",\"stringr\",\"lubridate\", \"readr\",\"ggplot2\",\"tibble\",\"janitor\",\"glue\" )) # Modeling + evaluation install.packages(c(\"broom\",\"rsample\",\"yardstick\",\"scoringRules\",\"pROC\")) # Optional (for speed / nicer tables) install.packages(c(\"data.table\",\"DT\")) # Football data # worldfootballR is usually installed from GitHub # See: https://rprogrammingbooks.com/install-use-worldfootballr/ # If needed: # install.packages(\"remotes\") # remotes::install_github(\"JaseZiv/worldfootballR\") library(dplyr) library(tidyr) library(purrr) library(stringr) library(lubridate) library(readr) library(ggplot2) library(janitor) library(glue) # worldfootballR (uncomment after install) # library(worldfootballR) set.seed(2026) Modeling note: The baseline approach below is the classic Poisson goals model (team attack/defense + home advantage). It’s simple, explainable, and a great foundation. You can extend it later to xG models, Bayesian hierarchical models, or time-varying strength. If you like Bayesian approaches, see: Bayesian Sports Analytics (Book/Product) . Get match data The easiest path is to pull historical match results from public sources. With worldfootballR, you can often scrape league seasons from sources like FBref. The specific function names can vary by package version/source, so below you’ll see: Option A: Use worldfootballR directly (recommended). Option B: Use your own CSV export if you already have data. Option A — Pull league season data with worldfootballR If you need help with installation and authentication quirks, see: Install & Use worldfootballR. # --- Option A (worldfootballR) --- # The exact workflow depends on the data source (FBref / other). # Typical pattern: # 1) Get competition URLs # 2) Pull match results for seasons # # PSEUDOCODE (adjust per your worldfootballR version): # # comp_urls <- fb_league_urls(country \"ENG\", gender \"M\", season_end_year 2026) # epl_url <- comp_urls %>% filter(str_detect(Competition_Name, \"Premier League\")) %>% pull(Seasons_Urls) %>% . # # matches <- fb_match_results(season_url epl_url) %>% # janitor::clean_names() # # head(matches) Option B — Use a CSV export (works everywhere) Your data needs (minimum): date, home_team, away_team, home_goals, away_goals. Save it as matches.csv. # --- Option B (CSV) --- matches <- readr::read_csv(\"matches.csv\") %>% janitor::clean_names() %>% mutate(date as.Date(date)) %>% filter(!is.na(home_goals), !is.na(away_goals)) %>% arrange(date) dplyr::glimpse(matches) Standardize columns # Make sure we have standardized column names matches <- matches %>% transmute( date as.Date(date), season if_else(month(date) > 7, year(date) + 1L, year(date)), # football season heuristic home as.character(home_team), away as.character(away_team), hg as.integer(home_goals), ag as.integer(away_goals) ) %>% filter(!is.na(date), !is.na(home), !is.na(away), !is.na(hg), !is.na(ag)) %>% arrange(date) # Basic sanity checks stopifnot(all(matches$hg > 0), all(matches$ag > 0)) matches %>% count(season) %>% arrange(desc(season)) %>% print(n 50) Feature engineering For a baseline Poisson model, we’ll estimate team strength through parameters: attack and defense, plus a home advantage. We’ll also build rolling form features as optional enhancements. Long format for modeling long <- matches %>% mutate(match_id row_number()) %>% tidyr::pivot_longer( cols c(home, away), names_to \"side\", values_to \"team\" ) %>% mutate( opp if_else(side \"home\", away, home), goals if_else(side \"home\", hg, ag), conceded if_else(side \"home\", ag, hg), is_home as.integer(side \"home\") ) %>% select(match_id, date, season, team, opp, is_home, goals, conceded) head(long) Optional: rolling “form” features These can help a bit, but don’t overfit. Keep them simple and always validate out-of-sample. # Rolling averages for goals scored/conceded over last N matches (per team) add_form_features <- function(df, n 5) { df %>% arrange(team, date, match_id) %>% group_by(team) %>% mutate( gf_roll zoo::rollapplyr(goals, width n, FUN mean, fill NA, partial TRUE), ga_roll zoo::rollapplyr(conceded, width n, FUN mean, fill NA, partial TRUE) ) %>% ungroup() } # install.packages(\"zoo\") if needed # library(zoo) # long <- add_form_features(long, n 5) Link building tip: If you cover multiple sports, create a “methods hub” page and link out to each sport’s analytics guide: Sports Analytics with R, NFL, Tennis, Boxing. This strengthens topical authority and internal PageRank flow. Model 1: Poisson goals (baseline) We’ll fit two Poisson regressions: one for home goals and one for away goals, with team attack/defense effects and a home advantage term. A standard approach is: home_goals ~ home_adv + attack(home) + defense(away) away_goals ~ attack(away) + defense(home) To avoid identifiability issues, we’ll set a baseline team as reference via factor levels. Train/test split by time (realistic for betting) # Time-based split (e.g., last 20% of matches as test) n_total <- nrow(matches) cut_idx <- floor(n_total * 0.80) train <- matches %>% slice(1:cut_idx) test <- matches %>% slice((cut_idx + 1):n_total) # Ensure consistent factor levels teams <- sort(unique(c(matches$home, matches$away))) train <- train %>% mutate(home factor(home, levels teams), away factor(away, levels teams)) test <- test %>% mutate(home factor(home, levels teams), away factor(away, levels teams)) # Fit models home_mod <- glm(hg ~ 1 + home + away, data train, family poisson()) away_mod <- glm(ag ~ 1 + away + home, data train, family poisson()) summary(home_mod) summary(away_mod) Interpretation: The simplest version above uses team fixed effects as factors. It works, but it mixes attack/defense. Next, we’ll fit the more interpretable attack/defense parameterization. Attack/Defense parameterization (more interpretable) # Build a modeling frame in the classic attack/defense form # We model home goals: # log(lambda_home) home_adv + attack_home - defense_away # And away goals: # log(lambda_away) attack_away - defense_home # Create team factors train2 <- train %>% mutate( home factor(home, levels teams), away factor(away, levels teams) ) # We'll encode attack and defense as separate factors by prefixing labels mk_attack <- function(team) factor(paste0(\"att_\", team), levels paste0(\"att_\", teams)) mk_def <- function(team) factor(paste0(\"def_\", team), levels paste0(\"def_\", teams)) train_home <- train2 %>% transmute( goals hg, is_home 1L, att mk_attack(home), def mk_def(away) ) train_away <- train2 %>% transmute( goals ag, is_home 0L, att mk_attack(away), def mk_def(home) ) train_long <- bind_rows(train_home, train_away) # Fit a single Poisson model with: # goals ~ is_home + att + def # Note: to reflect \"- defense\" we can include def and allow coefficients to learn direction; # For stricter structure you can re-code defense sign, but this works well in practice. ad_mod <- glm(goals ~ is_home + att + def, data train_long, family poisson()) summary(ad_mod) Predict expected goals (lambda) for each match predict_lambdas <- function(df, model, teams) { df2 <- df %>% mutate( home factor(home, levels teams), away factor(away, levels teams), att_home factor(paste0(\"att_\", home), levels paste0(\"att_\", teams)), def_away factor(paste0(\"def_\", away), levels paste0(\"def_\", teams)), att_away factor(paste0(\"att_\", away), levels paste0(\"att_\", teams)), def_home factor(paste0(\"def_\", home), levels paste0(\"def_\", teams)) ) # home lambda new_home <- df2 %>% transmute(is_home 1L, att att_home, def def_away) # away lambda new_away <- df2 %>% transmute(is_home 0L, att att_away, def def_home) lam_home <- predict(model, newdata new_home, type \"response\") lam_away <- predict(model, newdata new_away, type \"response\") df2 %>% mutate(lambda_home lam_home, lambda_away lam_away) } test_pred <- predict_lambdas(test, ad_mod, teams) head(test_pred) Model 2: Dixon–Coles adjustment (improves low scores) The independent Poisson assumption can under/overestimate probabilities for low scores (0–0, 1–0, 0–1, 1–1). Dixon–Coles introduces a small correction factor. Below is a clean implementation. # Dixon-Coles tau adjustment for low-score dependence tau_dc <- function(x, y, lam_x, lam_y, rho) { # x home goals, y away goals # rho is the dependence parameter if (x 0 && y 0) return(1 - (lam_x * lam_y * rho)) if (x 0 && y 1) return(1 + (lam_x * rho)) if (x 1 && y 0) return(1 + (lam_y * rho)) if (x 1 && y 1) return(1 - rho) return(1) } # Scoreline probability matrix up to max_goals score_matrix <- function(lam_h, lam_a, rho 0, max_goals 10) { xs <- 0:max_goals ys <- 0:max_goals ph <- dpois(xs, lam_h) pa <- dpois(ys, lam_a) # outer product for independent probabilities P <- outer(ph, pa) # apply DC tau correction for (i in seq_along(xs)) { for (j in seq_along(ys)) { P <- P * tau_dc(xs, ys, lam_h, lam_a, rho) } } # renormalize P / sum(P) } # Example P_ex <- score_matrix(lam_h 1.4, lam_a 1.1, rho 0.05, max_goals 8) round(P_ex, 4) How do we choose rho? You can estimate it by maximizing likelihood on training data. Here’s a lightweight optimizer: # Estimate rho by maximizing log-likelihood on train set given lambdas train_pred <- predict_lambdas(train, ad_mod, teams) dc_loglik <- function(rho, df, max_goals 10) { # clamp rho to a reasonable range to avoid numerical issues rho <- max(min(rho, 0.3), -0.3) ll <- 0 for (k in seq_len(nrow(df))) { lam_h <- df$lambda_home lam_a <- df$lambda_away hg <- df$hg ag <- df$ag P <- score_matrix(lam_h, lam_a, rho rho, max_goals max_goals) # if score exceeds max_goals, treat as tiny prob (or increase max_goals) if (hg > max_goals || ag > max_goals) { ll <- ll + log(1e-12) } else { ll <- ll + log(P + 1e-15) } } ll } opt <- optimize( f function(r) -dc_loglik(r, train_pred, max_goals 10), interval c(-0.2, 0.2) ) rho_hat <- opt$minimum rho_hat From scorelines to 1X2 probabilities Once you have a scoreline probability matrix P, compute: Home win: sum of probabilities where home goals > away goals Draw: sum of diagonal Away win: sum where home goals < away goals p1x2_from_matrix <- function(P) { max_g <- nrow(P) - 1 xs <- 0:max_g ys <- 0:max_g p_home <- 0 p_draw <- 0 p_away <- 0 for (i in seq_along(xs)) { for (j in seq_along(ys)) { if (xs > ys) p_home <- p_home + P if (xs ys) p_draw <- p_draw + P if (xs < ys) p_away <- p_away + P } } tibble(p_home p_home, p_draw p_draw, p_away p_away) } predict_1x2 <- function(df, rho 0, max_goals 10) { out <- vector(\"list\", nrow(df)) for (k in seq_len(nrow(df))) { P <- score_matrix(df$lambda_home, df$lambda_away, rho rho, max_goals max_goals) out] <- p1x2_from_matrix(P) } bind_rows(out) } test_1x2 <- bind_cols( test_pred, predict_1x2(test_pred, rho rho_hat, max_goals 10) ) test_1x2 %>% select(date, home, away, hg, ag, lambda_home, lambda_away, p_home, p_draw, p_away) %>% head(10) Odds, implied probabilities & value Betting decisions should be driven by expected value (EV). If the market offers decimal odds O and your model probability is p, then: Expected value per unit stake: EV p*(O-1) - (1-p) Value condition: p > 1/O You’ll typically have an odds feed. Below we assume you have a file odds.csv: date, home, away, odds_home, odds_draw, odds_away. odds <- readr::read_csv(\"odds.csv\") %>% janitor::clean_names() %>% mutate(date as.Date(date)) %>% transmute( date, home as.character(home), away as.character(away), o_home as.numeric(odds_home), o_draw as.numeric(odds_draw), o_away as.numeric(odds_away) ) df <- test_1x2 %>% mutate(home as.character(home), away as.character(away)) %>% left_join(odds, by c(\"date\",\"home\",\"away\")) # Implied probs (no vig removal yet) df <- df %>% mutate( imp_home 1 / o_home, imp_draw 1 / o_draw, imp_away 1 / o_away, overround imp_home + imp_draw + imp_away ) # Simple vig removal by normalization df <- df %>% mutate( mkt_home imp_home / overround, mkt_draw imp_draw / overround, mkt_away imp_away / overround ) # EV per 1 unit stake ev <- function(p, o) p*(o - 1) - (1 - p) df <- df %>% mutate( ev_home ev(p_home, o_home), ev_draw ev(p_draw, o_draw), ev_away ev(p_away, o_away) ) df %>% select(date, home, away, p_home, p_draw, p_away, o_home, o_draw, o_away, ev_home, ev_draw, ev_away) %>% head(10) Pick bets with thresholds (avoid noise) # Practical filters: require at least some edge and avoid tiny probabilities EDGE_MIN <- 0.02 # 2% EV edge P_MIN <- 0.05 # avoid extreme longshots unless you model them well df_bets <- df %>% mutate( pick case_when( ev_home pmax(ev_home, ev_draw, ev_away, na.rm TRUE) ~ \"H\", ev_draw pmax(ev_home, ev_draw, ev_away, na.rm TRUE) ~ \"D\", TRUE ~ \"A\" ), p_pick case_when(pick \"H\" ~ p_home, pick \"D\" ~ p_draw, TRUE ~ p_away), o_pick case_when(pick \"H\" ~ o_home, pick \"D\" ~ o_draw, TRUE ~ o_away), ev_pick case_when(pick \"H\" ~ ev_home, pick \"D\" ~ ev_draw, TRUE ~ ev_away) ) %>% filter(!is.na(o_pick)) %>% filter(p_pick > P_MIN, ev_pick > EDGE_MIN) df_bets %>% count(pick) Backtest: flat stake vs Kelly Backtesting is where most people fool themselves. Use a time-based split, realistic bet selection rules, and conservative bankroll sizing. Compute bet results # Outcome label df_bets <- df_bets %>% mutate( result case_when( hg > ag ~ \"H\", hg ag ~ \"D\", TRUE ~ \"A\" ), win as.integer(pick result) ) # Profit per 1 unit stake df_bets <- df_bets %>% mutate( profit_flat if_else(win 1, o_pick - 1, -1) ) df_bets %>% summarise( n_bets n(), hit_rate mean(win), avg_odds mean(o_pick), roi_flat mean(profit_flat) ) Kelly staking (fractional Kelly recommended) Full Kelly is often too aggressive. Use fractional Kelly (e.g., 0.25 Kelly). If you want a deeper treatment of Kelly and Bayesian uncertainty, see: Bayesian Sports Analytics (Book/Product) . kelly_fraction <- function(p, o) { # Decimal odds o. Net odds b o - 1 b <- o - 1 q <- 1 - p f <- (b*p - q) / b pmax(0, f) } KELLY_MULT <- 0.25 # fractional Kelly df_bets <- df_bets %>% mutate( f_kelly kelly_fraction(p_pick, o_pick), stake_kelly KELLY_MULT * f_kelly ) # Simulate bankroll simulate_bankroll <- function(df, start_bankroll 100, stake_col \"stake_kelly\") { br <- start_bankroll path <- numeric(nrow(df)) for (i in seq_len(nrow(df))) { stake <- df] stake_amt <- br * stake # Profit stake_amt*(o-1) if win else -stake_amt prof <- if (df$win 1) stake_amt*(df$o_pick - 1) else -stake_amt br <- br + prof path <- br } path } df_bets <- df_bets %>% arrange(date) # Flat staking: e.g., 1% bankroll per bet df_bets <- df_bets %>% mutate(stake_flat 0.01) df_bets$br_flat <- simulate_bankroll(df_bets, start_bankroll 100, stake_col \"stake_flat\") df_bets$br_kelly <- simulate_bankroll(df_bets, start_bankroll 100, stake_col \"stake_kelly\") tail(df_bets %>% select(date, home, away, pick, o_pick, win, br_flat, br_kelly), 10) Plot bankroll curves plot_df <- df_bets %>% select(date, br_flat, br_kelly) %>% pivot_longer(cols c(br_flat, br_kelly), names_to \"strategy\", values_to \"bankroll\") ggplot(plot_df, aes(x date, y bankroll, group strategy)) + geom_line() + labs(x NULL, y \"Bankroll\", title \"Backtest Bankroll: Flat vs Fractional Kelly\") Calibration diagnostics Profit is noisy. Calibration tells you if probabilities are sensible. A great proper scoring rule for 1X2 is the multi-class log loss. We’ll compute log loss for your 1X2 probabilities on the test set. # Create a probability matrix and truth labels df_eval <- df %>% filter(!is.na(p_home), !is.na(p_draw), !is.na(p_away)) %>% mutate( truth case_when(hg > ag ~ \"H\", hg ag ~ \"D\", TRUE ~ \"A\"), truth factor(truth, levels c(\"H\",\"D\",\"A\")) ) # Log loss (manual) log_loss_1x2 <- function(pH, pD, pA, y) { eps <- 1e-15 p <- ifelse(y\"H\", pH, ifelse(y\"D\", pD, pA)) -mean(log(pmax(p, eps))) } ll <- log_loss_1x2(df_eval$p_home, df_eval$p_draw, df_eval$p_away, df_eval$truth) ll Reliability plot (binning) # Example: calibration for HOME-win probability calib_home <- df_eval %>% mutate(bin ntile(p_home, 10)) %>% group_by(bin) %>% summarise( p_mean mean(p_home), freq mean(truth \"H\"), n n(), .groups \"drop\" ) ggplot(calib_home, aes(x p_mean, y freq)) + geom_point() + geom_abline(slope 1, intercept 0) + labs(x \"Predicted P(Home win)\", y \"Observed frequency\", title \"Calibration (Home win)\") Common upgrade path: Add xG features (if you have them) and/or time decay (recent matches matter more). Then validate calibration again. Don’t chase ROI without checking probability quality. Production: weekly pipeline Here’s a practical skeleton you can run weekly: fetch latest matches → refit/refresh → generate next fixtures probabilities → compare with odds. # 1) Load historical matches (from worldfootballR or your CSV) matches <- readr::read_csv(\"matches.csv\") %>% janitor::clean_names() # 2) Train up to cutoff date (e.g., yesterday) cutoff_date <- Sys.Date() - 1 hist <- matches %>% mutate(date as.Date(date)) %>% filter(date < cutoff_date) %>% transmute(date, home home_team, away away_team, hg home_goals, ag away_goals) %>% arrange(date) teams <- sort(unique(c(hist$home, hist$away))) # 3) Fit attack/defense model train_long <- bind_rows( hist %>% transmute(goals hg, is_home 1L, att factor(paste0(\"att_\", home), levels paste0(\"att_\", teams)), def factor(paste0(\"def_\", away), levels paste0(\"def_\", teams))), hist %>% transmute(goals ag, is_home 0L, att factor(paste0(\"att_\", away), levels paste0(\"att_\", teams)), def factor(paste0(\"def_\", home), levels paste0(\"def_\", teams))) ) ad_mod <- glm(goals ~ is_home + att + def, data train_long, family poisson()) # 4) Predict for upcoming fixtures (you need a fixtures table) fixtures <- readr::read_csv(\"fixtures.csv\") %>% janitor::clean_names() %>% mutate(date as.Date(date)) %>% transmute(date, home home_team, away away_team) # Compute lambdas fixtures2 <- fixtures %>% mutate( home factor(home, levels teams), away factor(away, levels teams) ) fixtures_pred <- predict_lambdas( df fixtures2 %>% mutate(hg 0L, ag 0L), # placeholders model ad_mod, teams teams ) # 5) Estimate rho (optional) on recent history only (faster) hist_pred <- hist %>% mutate(home factor(home, levelsteams), awayfactor(away, levelsteams)) hist_pred <- predict_lambdas(hist_pred, ad_mod, teams) opt <- optimize( f function(r) -dc_loglik(r, hist_pred %>% mutate(lambda_homelambda_home, lambda_awaylambda_away), max_goals 10), interval c(-0.2, 0.2) ) rho_hat <- opt$minimum # 6) Convert to 1X2 fixtures_1x2 <- bind_cols( fixtures_pred, predict_1x2(fixtures_pred, rho rho_hat, max_goals 10) ) %>% select(date, home, away, lambda_home, lambda_away, p_home, p_draw, p_away) write_csv(fixtures_1x2, \"model_probs.csv\") At this point you have model_probs.csv ready to merge with bookmaker odds and produce a bet shortlist. In your content strategy, link this post to your broader methods pages: Sports Analytics with R and the relevant sport hubs (NFL, tennis, boxing) to strengthen internal linking. FAQ Is a Poisson model “good enough” for football betting? It’s a strong baseline. It captures team strength and home advantage with minimal complexity. Many upgrades (xG, time decay, Bayesian partial pooling) improve robustness, but the baseline can already be useful. How do I avoid overfitting? Use time-based validation, keep features simple, and prioritize calibration and log loss. Don’t tune thresholds using the same data you evaluate on. What’s the simplest value-bet rule? Bet only when p_model > p_implied and you have a buffer (e.g. EV > 2%), then stake conservatively (flat 0.5%–1% bankroll or fractional Kelly). Where do I learn more advanced Bayesian sports models in R? If you want Bayesian approaches, uncertainty-aware staking, and a deeper treatment of the Kelly criterion, see: Bayesian Sports Analytics (Book/Product) . Next reads on rprogrammingbooks.com Install & Use worldfootballR (setup + troubleshooting) worldfootballR Guide (scraping + workflows) Sports Analytics with R (methods hub) Bayesian Sports Analytics (advanced modeling + Kelly) The post Football Betting Model in R (Step-by-Step Guide 2026) appeared first on R Programming Books.","keywords":"","datePublished":"2026-02-22T16:06:52-06:00","dateModified":"2026-02-22T16:06:52-06:00","author":{"@type":"Person","name":"rprogrammingbooks","url":"https://www.r-bloggers.com/author/rprogrammingbooks/","sameAs":["https://rprogrammingbooks.com/blog/"],"image":{"@type":"ImageObject","url":"https://secure.gravatar.com/avatar/8e52e10b4585659a95e3d60f4ab81ef4?s=96&d=mm&r=g","height":96,"width":96}},"editor":{"@type":"Person","name":"rprogrammingbooks","url":"https://www.r-bloggers.com/author/rprogrammingbooks/","sameAs":["https://rprogrammingbooks.com/blog/"],"image":{"@type":"ImageObject","url":"https://secure.gravatar.com/avatar/8e52e10b4585659a95e3d60f4ab81ef4?s=96&d=mm&r=g","height":96,"width":96}},"publisher":{"@id":"https://www.r-bloggers.com#Organization"},"image":{"@type":"ImageObject","@id":"https://www.r-bloggers.com/2026/02/football-betting-model-in-r-step-by-step-guide-2026/#primaryimage","url":"https://www.r-bloggers.com/wp-content/uploads/2020/07/R_logo.svg_.png","width":"1280","height":"992"},"isPartOf":{"@id":"https://www.r-bloggers.com/2026/02/football-betting-model-in-r-step-by-step-guide-2026/#webpage"}}]}] </script> <div id="amp-mobile-version-switcher" > <a rel="nofollow" href="https://www.r-bloggers.com/2026/02/football-betting-model-in-r-step-by-step-guide-2026/?noamp=mobile"> Exit mobile version </a> </div> <amp-analytics id="3993371cd9ec" type="googleanalytics"><script type="application/json">{"vars":{"account":"UA-419807-53"},"triggers":{"trackPageview":{"on":"visible","request":"pageview"}}}</script></amp-analytics><link rel='stylesheet' id='jetpack_css-css' href='https://www.r-bloggers.com/wp-content/plugins/jetpack/css/jetpack.css?ver=5.9.4' type='text/css' media='all'></link> </body> </html>