Site icon R-bloggers

Explained vs. Predictive Power: R², Adjusted R², and Beyond

[This article was first published on A Statistician's R Notebook, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

< section id="introduction" class="level1" data-number="1">

1 Introduction

You trust R². Should you?
You proudly present a model with R² = 0.95. Everyone applauds.
But what if your model fails miserably on the next new data?

When building a statistical model, one of the first numbers analysts and data scientists often cite is the , or coefficient of determination. It’s widely reported in research, academic theses, and industry reports — and yet, frequently misunderstood or misused.

Does a high R² mean your model is good? Is it enough to evaluate model performance? What about its adjusted or predictive counterparts?

This article will explore in depth: – What R², Adjusted R², and Predicted R² actually mean – Why relying solely on R² can mislead you – How to evaluate models using both explanatory and predictive power – Real-life implementation using the {tidymodels} framework in R

We’ll also discuss best practices and common pitfalls, and equip you with a mindset to look beyond surface-level model summaries.

< section id="theoretical-background" class="level1" data-number="2">

2 Theoretical Background

< section id="what-is-r²" class="level2" data-number="2.1">

2.1 What is R²?

The coefficient of determination, R², is defined as:

Where:

It tells us the proportion of variance explained by the model. An R² of 0.80 implies that 80% of the variability in the dependent variable is explained by the model.

But beware — it only measures fit to training data, not the model’s ability to generalize.

< section id="adjusted-r²" class="level2" data-number="2.2">

2.2 Adjusted R²

When we add predictors to a regression model, R² will never decrease — even if the added variables are irrelevant.

Adjusted R² corrects this by penalizing the number of predictors:

Where:

Thus, Adjusted R² will only increase if the new predictor improves the model more than expected by chance.

< section id="predicted-r²" class="level2" data-number="2.3">

2.3 Predicted R²

Predicted R² (or cross-validated R²) is the most honest estimate of model utility. It answers the question:

How well will this model predict new, unseen data?

This is typically calculated using cross-validation, and unlike regular R², it reflects out-of-sample performance.

You can also view it as:

Where PRESS is the Prediction Error Sum of Squares based on cross-validation.

< section id="dataset-overview" class="level1" data-number="3">

3 Dataset Overview

We’ll use the classic Boston Housing Dataset to demonstrate. It includes:

Below are the key variables:

This regression problem mimics common real estate or socio-economic modeling use cases. Let’s first examine the dataset’s summary statistics.

library(tidymodels)
library(MASS)
library(ggplot2)
library(corrr)
library(skimr)
library(patchwork)


boston <- MASS::Boston
skim(boston)
Data summary
Name boston
Number of rows 506
Number of columns 14
_______________________
Column type frequency:
numeric 14
________________________
Group variables None

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
crim 0 1 3.61 8.60 0.01 0.08 0.26 3.68 88.98 ▇▁▁▁▁
zn 0 1 11.36 23.32 0.00 0.00 0.00 12.50 100.00 ▇▁▁▁▁
indus 0 1 11.14 6.86 0.46 5.19 9.69 18.10 27.74 ▇▆▁▇▁
chas 0 1 0.07 0.25 0.00 0.00 0.00 0.00 1.00 ▇▁▁▁▁
nox 0 1 0.55 0.12 0.38 0.45 0.54 0.62 0.87 ▇▇▆▅▁
rm 0 1 6.28 0.70 3.56 5.89 6.21 6.62 8.78 ▁▂▇▂▁
age 0 1 68.57 28.15 2.90 45.02 77.50 94.07 100.00 ▂▂▂▃▇
dis 0 1 3.80 2.11 1.13 2.10 3.21 5.19 12.13 ▇▅▂▁▁
rad 0 1 9.55 8.71 1.00 4.00 5.00 24.00 24.00 ▇▂▁▁▃
tax 0 1 408.24 168.54 187.00 279.00 330.00 666.00 711.00 ▇▇▃▁▇
ptratio 0 1 18.46 2.16 12.60 17.40 19.05 20.20 22.00 ▁▃▅▅▇
black 0 1 356.67 91.29 0.32 375.38 391.44 396.22 396.90 ▁▁▁▁▇
lstat 0 1 12.65 7.14 1.73 6.95 11.36 16.96 37.97 ▇▇▅▂▁
medv 0 1 22.53 9.20 5.00 17.02 21.20 25.00 50.00 ▂▇▅▁▁

Commentary:

Next, we examine correlations with medv:

boston %>% correlate() %>% corrr::focus(medv) %>% arrange(desc(medv))
# A tibble: 13 × 2
   term      medv
   <chr>    <dbl>
 1 rm       0.695
 2 zn       0.360
 3 black    0.333
 4 dis      0.250
 5 chas     0.175
 6 age     -0.377
 7 rad     -0.382
 8 crim    -0.388
 9 nox     -0.427
10 tax     -0.469
11 indus   -0.484
12 ptratio -0.508
13 lstat   -0.738

Interpretation of Correlations:

These insights will guide us in building and evaluating our model.

< section id="exploratory-data-analysis" class="level1" data-number="4">

4 Exploratory Data Analysis

Let’s visualize some of the most influential variables in relation to medv, our target variable. These exploratory graphs help reveal potential linear or nonlinear relationships, outliers, or the need for transformation.

# Define individual plots with improved formatting for Quarto rendering
p1 <- ggplot(boston, aes(rm, medv)) +
  geom_point(alpha = 0.5, color = "#2c7fb8") +
  geom_smooth(method = "lm", se = FALSE, color = "black") +
  labs(
    title = "Rooms\nvs. Median Value",
    x = "Average Number of Rooms (rm)",
    y = "Median Value of Homes ($1000s)"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(size = 11, lineheight = 1.1))

p2 <- ggplot(boston, aes(lstat, medv)) +
  geom_point(alpha = 0.5, color = "#de2d26") +
  geom_smooth(method = "loess", se = FALSE, color = "black") +
  labs(
    title = "Lower Status %\nvs. Median Value",
    x = "% Lower Status Population (lstat)",
    y = "Median Value of Homes ($1000s)"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(size = 11, lineheight = 1.1))

p3 <- ggplot(boston, aes(nox, medv)) +
  geom_point(alpha = 0.5, color = "#31a354") +
  geom_smooth(method = "loess", se = FALSE, color = "black") +
  labs(
    title = "NOx Concentration\nvs. Median Value",
    x = "NOx concentration (ppm)",
    y = "Median Value of Homes ($1000s)"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(size = 11, lineheight = 1.1))

p4 <- ggplot(boston, aes(age, medv)) +
  geom_point(alpha = 0.5, color = "#ff7f00") +
  geom_smooth(method = "loess", se = FALSE, color = "black") +
  labs(
    title = "Old Homes %\nvs. Median Value",
    x = "% Homes Built Before 1940 (age)",
    y = "Median Value of Homes ($1000s)"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(size = 11, lineheight = 1.1))

p5 <- ggplot(boston, aes(tax, medv)) +
  geom_point(alpha = 0.5, color = "#6a3d9a") +
  geom_smooth(method = "loess", se = FALSE, color = "black") +
  labs(
    title = "Tax Rate\nvs. Median Value",
    x = "Tax Rate (per $10,000)",
    y = "Median Value of Homes ($1000s)"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(size = 11, lineheight = 1.1))

p6 <- ggplot(boston, aes(dis, medv)) +
  geom_point(alpha = 0.5, color = "#1f78b4") +
  geom_smooth(method = "loess", se = FALSE, color = "black") +
  labs(
    title = "Distance to Jobs\nvs. Median Value",
    x = "Weighted Distance to Employment Centers (dis)",
    y = "Median Value of Homes ($1000s)"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(size = 11, lineheight = 1.1))
(p1 | p2) + plot_layout(guides = 'collect')

(p3 | p4) + plot_layout(guides = 'collect')

(p5 | p6) + plot_layout(guides = 'collect')

These six plots combine both socioeconomic and environmental dimensions of housing value — providing both intuition and modeling direction.

< section id="modeling-with-tidymodels" class="level1" data-number="5">

5 Modeling with Tidymodels

Now that we’ve explored the data, it’s time to fit a model using the tidymodels framework. We’ll use a simple linear regression to predict medv, the median home value.

< section id="data-splitting-and-preprocessing" class="level2" data-number="5.1">

5.1 Data Splitting and Preprocessing

We begin by splitting the dataset into training and testing sets. The training set will be used to fit the model, and the test set will evaluate its generalization performance.

set.seed(42)
split <- initial_split(boston, prop = 0.8)
train <- training(split)
test <- testing(split)

rec <- recipe(medv ~ ., data = train)
model <- linear_reg() %>% set_engine("lm")
workflow <- workflow() %>% add_recipe(rec) %>% add_model(model)
< section id="model-fitting" class="level2" data-number="5.2">

5.2 Model Fitting

We now fit the model to the training data:

fit <- fit(workflow, data = train)
< section id="evaluating-the-model-on-the-training-set" class="level2" data-number="5.3">

5.3 Evaluating the Model on the Training Set

Let’s extract the R² and Adjusted R² values from the fitted model:

training_summary <- glance(extract_fit_parsnip(fit))
training_summary %>% dplyr::select(r.squared, adj.r.squared)
# A tibble: 1 × 2
  r.squared adj.r.squared
      <dbl>         <dbl>
1     0.726         0.717

🔍 Interpretation:

If R² and Adjusted R² differ significantly, it indicates that some predictors may not be contributing meaningfully to the model.

Example: A model with 12 predictors might show R² = 0.76, but Adjusted R² = 0.72 — suggesting some predictors are adding complexity without real explanatory power.

< section id="test-set-performance" class="level2" data-number="5.4">

5.4 Test Set Performance

Now we assess the model on the unseen test data:

preds <- predict(fit, test) %>% bind_cols(test)
metrics(preds, truth = medv, estimate = .pred)
# A tibble: 3 × 3
  .metric .estimator .estimate
  <chr>   <chr>          <dbl>
1 rmse    standard       4.79 
2 rsq     standard       0.784
3 mae     standard       3.32 

📉 Interpretation:

< section id="cross-validation-for-predicted-r²" class="level2" data-number="5.5">

5.5 Cross-Validation for Predicted R²

To get a more robust performance estimate, we use 10-fold cross-validation:

set.seed(42)
cv <- vfold_cv(train, v = 10)
resample <- fit_resamples(
  workflow,
  resamples = cv,
  metrics = metric_set(rsq, rmse),
  control = control_resamples(save_pred = TRUE)
)
collect_metrics(resample)
# A tibble: 2 × 6
  .metric .estimator  mean     n std_err .config             
  <chr>   <chr>      <dbl> <int>   <dbl> <chr>               
1 rmse    standard   4.79     10  0.384  Preprocessor1_Model1
2 rsq     standard   0.712    10  0.0341 Preprocessor1_Model1

✅ Interpretation:

Tip

Use cross-validation as a standard evaluation tool, especially when data is limited.

💬 Summary of Findings:

In the next step, we can analyze residuals or explore model improvements such as polynomial terms or regularization.

< section id="residual-diagnostics" class="level2" data-number="5.6">

5.6 Residual Diagnostics

Let’s now check if our linear model satisfies basic regression assumptions. We’ll plot residuals and assess patterns, non-linearity, and potential heteroskedasticity.

library(broom)
library(ggthemes)

aug <- augment(fit$fit$fit$fit)

ggplot(aug, aes(.fitted, .resid)) +
  geom_point(alpha = 0.5, color = "#2c7fb8") +
  geom_hline(yintercept = 0, linetype = "dashed") +
  labs(
    title = "Residuals vs Fitted Values",
    x = "Fitted Values",
    y = "Residuals"
  ) +
  theme_minimal()

📌 Interpretation:

< section id="improving-the-model-transforming-lstat" class="level2" data-number="5.7">

5.7 Improving the Model: Transforming lstat

From our earlier EDA, we saw a strong nonlinear relationship between lstat (lower status %) and medv. Let’s try log-transforming lstat to capture that curvature.

< section id="updated-recipe-with-transformation" class="level3" data-number="5.7.1">

5.7.1 Updated Recipe with Transformation

rec_log <- recipe(medv ~ ., data = train) %>%
  step_log(lstat)

workflow_log <- workflow() %>%
  add_model(model) %>%
  add_recipe(rec_log)

fit_log <- fit(workflow_log, data = train)
< section id="evaluation-of-transformed-model" class="level3" data-number="5.7.2">

5.7.2 Evaluation of Transformed Model

preds_log <- predict(fit_log, test) %>% bind_cols(test)
metrics(preds_log, truth = medv, estimate = .pred)
# A tibble: 3 × 3
  .metric .estimator .estimate
  <chr>   <chr>          <dbl>
1 rmse    standard       4.43 
2 rsq     standard       0.815
3 mae     standard       3.16 
glance(fit_log)
# A tibble: 1 × 12
  r.squared adj.r.squared sigma statistic   p.value    df logLik   AIC   BIC
      <dbl>         <dbl> <dbl>     <dbl>     <dbl> <dbl>  <dbl> <dbl> <dbl>
1     0.785         0.778  4.21      110. 2.64e-121    13 -1147. 2324. 2384.
# ℹ 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>

🧠 Interpretation:

Tip

Transformations, polynomial terms, and splines are all valid strategies to improve linear models without abandoning interpretability.

With residuals checked and a transformation tested, our next step could be to explore regularized models like ridge or lasso regression, or even move beyond linearity with tree-based models.

< section id="common-pitfalls-and-misconceptions" class="level1" data-number="6">

6 Common Pitfalls and Misconceptions

Even though R² is widely reported and intuitively appealing, its interpretation is often flawed — even by experienced analysts. Here, we’ll go beyond textbook definitions and highlight real-world traps and misunderstandings related to R² and its variants.

🚫 Misconception 1: High R² means the model is good

⚠️ Misconception 2: Adding predictors always improves the model

❌ Misconception 3: R² indicates causality

📉 Misconception 4: R² is a universal performance metric

🔍 Misconception 5: Residual plots don’t matter if R² is high

💡 Misconception 6: Predicted R² isn’t necessary

🔬 Misconception 7: R² has a fixed interpretation


Insight: Always use R² in context — alongside other metrics, validation strategies, and graphical checks.

For a deeper dive into R² misconceptions and proper regression diagnostics, see:

Together, these references build the foundation for responsible model interpretation.

< section id="conclusion-recommendations" class="level1" data-number="7">

7 Conclusion & Recommendations

< section id="summary" class="level2" data-number="7.1">

7.1 📌 Summary

In this post, we explored , Adjusted R², and Predicted R² in depth — not just as mathematical constructs, but as tools for critical thinking in modeling. We walked through theory, practical application in R with tidymodels, residual diagnostics, and even model improvement through transformation.

Let’s recap: – tells us how well our model fits the training data, but can be misleading on its own. – Adjusted R² improves upon R² by accounting for model complexity. – Predicted R², evaluated via cross-validation, provides the most trustworthy estimate of real-world performance.

High R² values can be seductive. But as we saw, they don’t guarantee causality, generalizability, or correctness. Only by combining R² with residual diagnostics, domain knowledge, and out-of-sample validation can we judge a model responsibly.

< section id="recommendations-for-practitioners" class="level2" data-number="7.2">

7.2 💡 Recommendations for Practitioners

  1. Always accompany R² with Adjusted and Predicted R² — never rely on one metric alone.
  2. Perform residual diagnostics to check linearity, variance assumptions, and outlier influence.
  3. Use cross-validation (e.g., 10-fold) as a default evaluation strategy, especially when the dataset is not large.
  4. Transform nonlinear predictors (as we did with lstat) or use flexible models (e.g., splines, GAMs) when needed.
  5. Avoid including irrelevant predictors — they inflate R² without improving generalization.
  6. Contextualize your R² — in some fields, a lower R² is still useful; in others, it may signal inadequacy.
  7. Complement numerical metrics with visual tools — scatterplots, predicted vs. actual plots, and residuals reveal insights numbers alone may miss.
< section id="looking-ahead" class="level2" data-number="7.3">

7.3 🚀 Looking Ahead

If you want to take your modeling further: – Try ridge or lasso regression to handle multicollinearity. – Explore tree-based models (e.g., random forests) when relationships are complex and nonlinear. – Use tools like yardstick and modeltime to automate robust validation and reporting.

In the end, modeling isn’t just about maximizing R² — it’s about understanding your data, validating your decisions, and making informed predictions.

Thanks for reading!

Feel free to share, fork, or reuse this analysis. Questions and comments are welcome.

< !-- -->
To leave a comment for the author, please follow the link and comment on their blog: A Statistician's R Notebook.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Exit mobile version