GAMs for Customer Lifetime Value (CLV) prediction
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I typically work in quantitative ecology and molecular epidemiology, where we use statistical models to predict species distributions or disease transmission patterns. Recently though, I had an interesting conversation with a data science PhD student who mentioned they were applying GAMs to predict Customer Lifetime Value at a SaaS startup. This caught my attention because CLV prediction, as it turns out, faces remarkably similar statistical challenges to ecological forecasting: nonlinear relationships that saturate at biological or business limits, hierarchical structures where groups behave differently, and the need to balance model flexibility with interpretability for stakeholders who need to understand why the model makes certain predictions.
After diving into the business analytics literature, I discovered that companies take wildly different approaches to this problem. Many rely on simple heuristics like average order value × frequency × lifespan that ignore individual customer trajectories and temporal dynamics. Others implement sophisticated probabilistic models (the Pareto/NBD and BG/NBD families are particularly elegant), though these often require specialized knowledge to tune and interpret. Black-box machine learning models can capture complex patterns but sacrifice the interpretability that executives need for strategic decisions. Meanwhile, standard regression approaches often predict impossible values like negative revenue or infinite growth.
Generalized Additive Models provide a compelling middle ground that I think deserves more attention in this space. GAMs capture complex nonlinear relationships while maintaining full interpretability through visualizable smooth functions. They handle the heteroscedastic variance inherent in revenue data, automatically adapt to saturation effects, and provide uncertainty quantification that’s crucial for risk assessment. This post demonstrates how to build CLV models that respect business constraints, capturing channel decay patterns, tier-specific behaviors, and feature adoption dynamics, while remaining straightforward enough to deploy without extensive infrastructure or specialized optimization expertise.
Understanding SaaS business dynamics
Before we dive into the modeling (I promise we’ll get to the GAMs soon), it’s worth understanding why Software as a Service (SaaS) businesses present such interesting statistical challenges. Unlike traditional software vendors who sell one-time licenses, or e-commerce platforms with discrete transactions, SaaS companies operate on subscription-based recurring revenue. Customers pay monthly or annually for continued access to cloud-hosted software, which fundamentally changes the economics and, more importantly for us, the statistical patterns we need to model.
Here’s the key challenge: customer acquisition costs are typically massive relative to monthly revenue. Think about it, a customer paying $99/month with a $500 acquisition cost needs at least six months just to break even. This is where accurate CLV prediction becomes critical. You need to know which customer segments will stick around long enough to justify that upfront investment, and traditional models often miss the mark completely.
What makes this particularly interesting from a modeling perspective is how customer value evolves. Feature adoption doesn’t just correlate with retention, it actively drives expansion revenue through tier upgrades and add-ons. I’ve seen datasets where highly engaged Basic tier customers generate more lifetime value than barely-engaged Enterprise customers, despite the 6x price difference. Meanwhile, customer success teams intervene based on usage patterns, creating feedback loops where our predictions influence outcomes. Add in the fact that churned customers can be reactivated (unlike, say, a dead coral reef in my usual work), and you’ve got a complex dynamic system.
The statistical requirements here are non-trivial: we need models that handle nonlinear feature effects (adoption plateaus around 80-90% for most products), tier-specific behaviors (Enterprise customers behave fundamentally differently), temporal decay patterns (paid acquisition channels show diminishing returns), and heteroscedastic variance (high-value customers are inherently more variable). Linear models with their constant effects and variance assumptions? They’ll predict negative revenue or infinite growth. Simple multipliers? They miss all the interesting dynamics. This is exactly where GAMs shine.
In this post, I’ll walk through building a production-ready CLV prediction model using GAMs, starting from simulated data that captures realistic SaaS dynamics through to extracting specific business insights like upgrade thresholds and feature investment ROI. By the end, you’ll have a complete workflow that you can adapt to your own customer data, whether you’re in SaaS, e-commerce, or any business with recurring customer relationships.
Environment setup
Before we dive into the modeling, let’s load our packages. Nothing exotic here, just the usual suspects for GAM fitting and visualization:
# Load libraries library(mgcv) # Fit flexible GAMs with smooth terms library(tidyverse) # Data manipulation and visualization library(marginaleffects) # Extract interpretable predictions from GAMs library(gratia) # Beautiful plots and posterior sampling for GAMs library(scales) # Format axes for currency and percentages # Set a clean plotting theme theme_set(theme_bw(base_size = 15, base_family = 'serif') + theme(panel.grid = element_blank()))
Simulating realistic SaaS customer dynamics
Let’s build some synthetic data that captures real B2B SaaS patterns. I’m a big believer in simulation for understanding model behavior, it’s something I use constantly in my ecological work, and it translates perfectly here. By controlling the data generating process, we know exactly what patterns our GAM should recover.
I’ll simulate 100 customers tracked over 5 months, using realistic SaaS pricing tiers: Basic ($99/month), Pro ($299/month), and Enterprise ($599/month). The target variable is 6-month CLV, a common planning horizon in SaaS businesses. Now here’s where it gets interesting, and where we build in patterns that would break traditional models:
First, feature adoption naturally plateaus. Nobody uses 100% of features (trust me, I still discover new RStudio shortcuts after a decade). Second, paid acquisition channels show decay, customers attracted by promotions gradually revert to their natural engagement levels. Third, and this is crucial, Enterprise customers don’t just have higher baseline value, they exhibit completely different response curves to adoption. These nonlinearities are exactly what GAMs handle naturally, no feature engineering required.
# Set seed for reproducibility set.seed(55) # Generate customer-month observations train_data <- expand.grid( customer_id = 1:100, month = 1:5 ) %>% dplyr::mutate( # Assign customers to pricing tiers with typical SaaS distribution contract_tier = sample(c("Basic", "Pro", "Enterprise"), 100, replace = TRUE, prob = c(0.5, 0.3, 0.2))[customer_id], # Customer acquisition channels affect initial value and retention acquisition_channel = sample(c("Organic", "Paid", "Partner"), 100, replace = TRUE)[customer_id], # Feature adoption grows over time but naturally caps around 95% feature_adoption_pct = pmin(95, pmax(5, 20 + month * 12 + rnorm(dplyr::n(), 0, 10))), # Monthly recurring revenue depends on tier base_mrr = dplyr::case_when( contract_tier == "Basic" ~ 99, contract_tier == "Pro" ~ 299, contract_tier == "Enterprise" ~ 599, TRUE ~ 99 ), # Paid acquisition provides early boost that decays over time marketing_boost = ifelse(acquisition_channel == "Paid", 0.3 * exp(-0.2 * month), 0), # CLV with stronger saturation effects for Enterprise adoption_multiplier = dplyr::case_when( contract_tier == "Enterprise" ~ 1 + 2 * (1 - exp(-0.05 * feature_adoption_pct)), contract_tier == "Pro" ~ 1 + 0.5 * (feature_adoption_pct/100), TRUE ~ 1 + 0.2 * (feature_adoption_pct/100) ), # Calculate CLV with clear asymptotic behavior clv_6m = base_mrr * 6 * adoption_multiplier * (1 + marketing_boost) * exp(rnorm(n(), 0, 0.15)) ) %>% # Convert to factors after numeric calculations dplyr::mutate( contract_tier = factor(contract_tier, levels = c("Basic", "Pro", "Enterprise")), acquisition_channel = factor(acquisition_channel) ) # Let's visualize the data we just created to understand the patterns # we're asking our GAM to capture ggplot(train_data, aes(x = feature_adoption_pct, y = clv_6m, color = contract_tier)) + # Add individual customer data points geom_point(alpha = 0.6, size = 1.5) + # Use a quadratic term to show nonlinear relationships geom_smooth(method = "lm", formula = y ~ poly(x, 2), se = TRUE, alpha = 0.2) + # Format y-axis as currency scale_y_continuous(labels = scales::dollar) + # Use consistent color scheme across tiers scale_color_manual(values = c("darkblue", "darkred", "black")) + # Separate panels by acquisition channel facet_wrap(~ acquisition_channel) + # Add informative axis labels labs(x = "Feature Adoption Rate (%)", y = "6-Month Customer Lifetime Value ($)", color = "Contract Tier")

Notice how the relationship between adoption and CLV isn’t just nonlinear, it’s differently nonlinear for each tier. Enterprise customers hit a ceiling hard (classic diminishing returns), while Basic tier customers show nearly linear growth. The Paid acquisition channel generates higher initial CLV, but look closely at high adoption rates, that premium fades. This suggests organic customers eventually match paid ones if they stick around and engage with the product. But we will see a clearer plot of these relationships when we estimate them using our GAM below.
# Now examine temporal patterns - how does CLV evolve over time? train_data %>% ggplot(aes(x = month, y = clv_6m, color = acquisition_channel)) + # Add LOESS smoothing with confidence bands for each channel geom_smooth(aes(fill = acquisition_channel), method = "loess", se = TRUE, alpha = 0.3, linewidth = 1.2) + # Format y-axis as currency scale_y_continuous(labels = scales::dollar) + # Apply consistent color scheme for channels scale_color_manual(values = c("darkblue", "darkred", "black")) + scale_fill_manual(values = c("darkblue", "darkred", "black")) + # Create separate panels for each tier with independent y-scales facet_wrap(~ contract_tier, scales = "free_y", ncol = 2) + # Add informative axis labels labs(x = "Months Since Customer Acquisition", y = "6-Month Customer Lifetime Value ($)", color = "Channel", fill = "Channel")

The temporal patterns here are interesting. Paid customers maintain higher CLV consistently, but the effect varies by tier. In Enterprise, we’re talking nearly $2,000 premium for Paid acquisition, while in Basic it’s much smaller. Partner channels are the wild card, they match Organic in most tiers but start to outperform in Pro. If I were building acquisition strategy from these patterns, we might want to focus paid spending on Enterprise and partner relationships on Pro customers.
Initial GAM: Gaussian with log link
But rather than trying to make interpretations directly from the observed data, we would do better by fitting some models. So let’s fit our first GAM. I’m starting with a Gaussian distribution and log link, which at least ensures positive predictions (unlike linear models that cheerfully predict negative revenue):
# Fit our first GAM with hierarchical smooths clv_model <- gam( clv_6m ~ contract_tier + # Global smooth of feature adoption: captures the overall # nonlinear relationship between adoption and CLV s(feature_adoption_pct, k = 6) + # Tier-specific adoption curves: allows each tier to have # different saturation patterns (e.g., Enterprise plateaus earlier) s(feature_adoption_pct, by = contract_tier, k = 6) + # Channel-specific temporal effects: models how acquisition # channel impacts CLV evolution over customer lifetime s(month, by = acquisition_channel, k = 4) + # Overall time trend: captures market-wide changes or # seasonal patterns affecting all customers s(month, k = 4), data = train_data, method = "REML", family = gaussian(link = "log") ) summary(clv_model) ## ## Family: gaussian ## Link function: log ## ## Formula: ## clv_6m ~ contract_tier + s(feature_adoption_pct, k = 6) + s(feature_adoption_pct, ## by = contract_tier, k = 6) + s(month, by = acquisition_channel, ## k = 4) + s(month, k = 4) ## ## Parametric coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 6.55559 0.08326 78.74 <2e-16 *** ## contract_tierPro 1.25791 0.08945 14.06 <2e-16 *** ## contract_tierEnterprise 2.72850 0.08356 32.66 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Approximate significance of smooth terms: ## edf Ref.df F p-value ## s(feature_adoption_pct) 2.530042 3.042080 1.424 0.22991 ## s(feature_adoption_pct):contract_tierBasic 0.001621 0.003221 0.005 0.99691 ## s(feature_adoption_pct):contract_tierPro 1.001362 1.002606 0.684 0.40869 ## s(feature_adoption_pct):contract_tierEnterprise 1.681453 1.959838 0.725 0.52648 ## s(month):acquisition_channelOrganic 1.000103 1.000205 4.438 0.03566 ## s(month):acquisition_channelPaid 0.451201 0.749047 0.136 0.74953 ## s(month):acquisition_channelPartner 1.000556 1.001099 4.129 0.04265 ## s(month) 1.000119 1.000227 9.978 0.00168 ## ## s(feature_adoption_pct) ## s(feature_adoption_pct):contract_tierBasic ## s(feature_adoption_pct):contract_tierPro ## s(feature_adoption_pct):contract_tierEnterprise ## s(month):acquisition_channelOrganic * ## s(month):acquisition_channelPaid ## s(month):acquisition_channelPartner * ## s(month) ** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Rank: 33/35 ## R-sq.(adj) = 0.959 Deviance explained = 96% ## -REML = 4118.5 Scale est. = 8.0808e+05 n = 500
Take a moment to look at that summary output. The model structure here deserves some attention because it showcases what makes GAMs so powerful. We’ve got a global smooth of feature_adoption_pct
that captures the overall adoption-CLV relationship, then by = contract_tier
terms that let each tier deviate from this baseline. Think of it as saying “there’s a general pattern, but Enterprise customers can do their own thing if the data supports it.”
This hierarchical setup is more elegant than fitting separate models per tier (which would ignore shared patterns) but more flexible than forcing all tiers into the same mold. The same logic applies to our temporal smooths, we model overall time trends while allowing channels to evolve differently. As I’ve discussed elsewhere, the penalized likelihood estimation automatically handles the bias-variance tradeoff. Weak tier-specific patterns get shrunk toward zero (defaulting to the global pattern), while strong signals are preserved.
But here’s the problem with this Gaussian model: it assumes constant variance. In revenue data? That’s wishful thinking. High-value Enterprise customers vary wildly in their outcomes, while Basic customers are more predictable. This heteroscedasticity isn’t just a statistical nuisance, it directly impacts business decisions about risk and investment.
The Tweedie advantage: Matching the data’s reality
Here’s something that took me years to fully appreciate: choosing the right distribution family is often more important than adding model complexity. Business metrics, particularly revenue, violate pretty much every assumption of the Gaussian distribution. Values are strictly positive, the variance grows with the mean (bigger customers are less predictable), and you might have exact zeros when customers churn immediately. Sound familiar? In ecology, we deal with the same issues when modeling species abundance.
The Tweedie distribution handles all of this elegantly. Mathematically, it’s characterized by its variance function:
$$\text{Var}(Y) = \phi \cdot \mu^p$$
where \(\mu\)
is the mean, \(\phi\)
is the dispersion parameter, and \(p\)
is the power parameter. Now here’s the clever bit: when \(1 < p < 2\)
, you get a compound Poisson-Gamma distribution that models continuous positive values with possible exact zeros. The parameter \(p\)
tells you about your data’s nature. Values near 1 suggest count-like behavior, near 2 indicates pure continuous data, and CLV typically lands around 1.5 to 1.8 (mostly continuous revenue with occasional immediate churners).
What I find particularly satisfying is how this matches business reality. Enterprise customers with high mean CLV naturally show more variance than predictable Basic customers, and the Tweedie captures this automatically through that \(\mu^p\)
relationship. No variance stabilizing transformations, no separate zero-inflation component, just pick the right tool for the job. As
I’ve written about when modeling temporal patterns, respecting your data’s generative process isn’t just statistically proper, it leads to better predictions and honest uncertainty estimates.
# Now fit with Tweedie distribution - same model structure, # just changing the family to handle heteroscedastic variance clv_tweedie <- gam( clv_6m ~ contract_tier + # Global adoption smooth: overall adoption-CLV relationship s(feature_adoption_pct, k = 6) + # Tier-specific adoption responses: different value curves # per pricing tier (Enterprise vs Pro vs Basic) s(feature_adoption_pct, by = contract_tier, k = 6) + # Channel decay patterns: how acquisition source affects # CLV trajectory over time s(month, by = acquisition_channel, k = 4) + # Market-wide temporal effects: overall time trends s(month, k = 4), data = train_data, method = "REML", # Tweedie family: handles positive continuous data with # potential exact zeros and heteroscedastic variance family = tw(link = "log") ) # Compare model fits - lower AIC is better AIC(clv_model, clv_tweedie) ## df AIC ## clv_model 13.75832 8235.860 ## clv_tweedie 14.28479 7248.197 # The power parameter tells us about our data's nature cat("Estimated Tweedie power parameter:", round(clv_tweedie$family$getTheta(TRUE), 3), "\n") ## Estimated Tweedie power parameter: 1.935
That estimated power parameter tells us something important about our CLV data. We’re not dealing with pure counts (p=1) or pure continuous data (p=2), but something in between, exactly what you’d expect from subscription revenue with occasional zeros.
Let’s visualize the fitted smooth functions to understand what the model learned about feature adoption and temporal patterns:
# Visualize the smooth functions for feature adoption across tiers # This shows the predicted CLV response to adoption for each tier marginaleffects::plot_predictions( clv_tweedie, condition = c('feature_adoption_pct', 'contract_tier', 'contract_tier') ) + # Format y-axis as currency scale_y_continuous(labels = scales::dollar) + # Use consistent color scheme scale_color_manual(values = c("darkblue", "darkred", "black")) + scale_fill_manual(values = c("darkblue", "darkred", "black")) + # Clean up labels labs(x = "Feature Adoption Rate (%)", y = "Predicted CLV") + # Remove redundant legend since we're faceting theme(legend.position = 'none')

The smooth functions reveal some interesting patterns: Enterprise customers show classic saturation behavior, plateauing around 75% adoption as expected. Pro customers demonstrate steady growth with mild saturation at high adoption levels. Most intriguingly, Basic customers show a peaked response, with CLV actually declining after around 50% adoption. This suggests Basic customers might hit complexity barriers where additional features create more confusion than value.
# Visualize temporal patterns by acquisition channel # This reveals how CLV evolves differently for each channel over time marginaleffects::plot_predictions( clv_tweedie, condition = c('month', 'acquisition_channel', 'acquisition_channel') ) + # Format y-axis as currency scale_y_continuous(labels = scales::dollar) + # Use consistent color scheme scale_color_manual(values = c("darkblue", "darkred", "black")) + scale_fill_manual(values = c("darkblue", "darkred", "black")) + # Clean up labels labs(x = "Months Since Acquisition", y = "Predicted CLV") + # Remove redundant legend since we're faceting theme(legend.position = 'none')

The temporal patterns tell a compelling story about channel effectiveness over time. Paid acquisition actually shows declining CLV, suggesting that promotional incentives attract customers who don’t stick around or engage deeply. Organic and Partner channels both show steady growth, with customers becoming more valuable as they mature and discover the product’s value naturally. This decay in Paid channel effectiveness over time is exactly the kind of pattern that traditional multiplier models would miss completely.
Now let me show you why using the Tweedie matters for actual business decisions. I’m going to use posterior simulation to generate prediction intervals, not just standard errors around the mean, but full uncertainty including observation-level variance. This is what executives actually need to know: “what’s the range of possible outcomes for this customer segment?”
# Draw posterior samples from both models (includes all sources of uncertainty) posterior_gaussian <- posterior_samples( clv_model, n = 1000, data = train_data, seed = 42, unconditional = TRUE, mvn_method = "mgcv" ) posterior_tweedie <- posterior_samples( clv_tweedie, n = 1000, data = train_data, seed = 42, unconditional = TRUE, mvn_method = "mgcv" ) # Helper function to extract quantiles from posterior tibble get_intervals <- function(posterior_tbl) { posterior_tbl %>% dplyr::group_by(.row) %>% dplyr::summarise( lower = quantile(.response, 0.025), upper = quantile(.response, 0.975), .groups = "drop" ) } # Calculate prediction intervals from posterior samples pred_data <- train_data %>% dplyr::mutate( # Get point predictions for plotting centers pred_gaussian = predict(clv_model, newdata = ., type = "response"), pred_tweedie = predict(clv_tweedie, newdata = ., type = "response") ) %>% # Calculate intervals for both models dplyr::bind_cols( get_intervals(posterior_gaussian) %>% dplyr::select(lower_gaussian = lower, upper_gaussian = upper), get_intervals(posterior_tweedie) %>% dplyr::select(lower_tweedie = lower, upper_tweedie = upper) ) %>% dplyr::mutate( # Calculate interval width to highlight heteroscedasticity interval_width_gaussian = upper_gaussian - lower_gaussian, interval_width_tweedie = upper_tweedie - lower_tweedie ) # Reshape data for faceting pred_data_long <- pred_data %>% tidyr::pivot_longer( cols = c(pred_gaussian, pred_tweedie), names_to = "model", names_pattern = "pred_(.*)", values_to = "predicted" ) %>% dplyr::mutate( # Add corresponding intervals lower = ifelse(model == "gaussian", lower_gaussian, lower_tweedie), upper = ifelse(model == "gaussian", upper_gaussian, upper_tweedie), # Create model labels model = factor(model, levels = c("gaussian", "tweedie"), labels = c("Gaussian", "Tweedie")) ) # Create comparison plot with facets ggplot(pred_data_long, aes(x = clv_6m, y = predicted)) + # Add prediction interval as ribbon geom_ribbon(aes(ymin = lower, ymax = upper, fill = model), alpha = 0.25) + # Add actual vs predicted points geom_point(aes(color = model), alpha = 0.4, size = 0.8) + # Add perfect prediction line (45-degree) geom_abline(slope = 1, intercept = 0, linetype = "dashed", alpha = 0.5) + # Use consistent color scheme scale_color_manual(values = c("darkblue", "darkred")) + scale_fill_manual(values = c("darkblue", "darkred")) + # Format axes for currency scale_x_continuous(labels = scales::dollar) + scale_y_continuous(labels = scales::dollar) + # Create side-by-side panels facet_wrap(~ model, ncol = 2) + # Add informative labels labs(x = "Actual CLV", y = "Predicted CLV") + # Remove legend since facet labels show the model theme(legend.position = "none")

# Create interval width comparison plot to highlight heteroscedasticity pred_data %>% tidyr::pivot_longer( cols = c(interval_width_gaussian, interval_width_tweedie), names_to = "model", names_pattern = "interval_width_(.*)", values_to = "interval_width" ) %>% dplyr::mutate( model = factor(model, levels = c("gaussian", "tweedie"), labels = c("Gaussian", "Tweedie")) ) %>% ggplot(aes(x = clv_6m, y = interval_width)) + # Add scatter points colored by model geom_point(aes(color = model), alpha = 0.5, size = 1.5) + # Add smoothed trend lines to show patterns geom_smooth(aes(color = model, fill = model), method = "loess", se = TRUE, alpha = 0.2) + # Use consistent color scheme scale_color_manual(values = c("darkblue", "darkred")) + scale_fill_manual(values = c("darkblue", "darkred")) + # Format axes appropriately scale_x_continuous(labels = scales::dollar) + scale_y_continuous(labels = scales::dollar) + # Facet by model for clarity facet_wrap(~ model, scales = "free_y") + # Add informative labels labs(x = "Actual Customer Lifetime Value", y = "95% Prediction Interval Width") + theme(legend.position = "none")

Look at that difference! The Tweedie model’s prediction intervals expand for high-value customers, while the Gaussian stubbornly maintains constant width. This isn’t just statistical pedantry, it directly impacts business decisions. When you’re evaluating a potential Enterprise account worth $10,000+, you need to know that outcome could range from $7,000 to $15,000, not the false precision of ±$2,000 that the Gaussian suggests. Heteroscedasticity isn’t a nuisance here, it’s a feature of the business reality we’re modeling.
Extracting actionable insights
Alright, we’ve got a well-fitting model with appropriate uncertainty quantification. But so what? In my experience, the real value of GAMs comes from their ability to answer specific “what if” questions. Let me show you how to extract insights that actually drive decisions.
I’ll tackle two questions that came up in my conversation with that data science student: First, if we improve feature adoption by 10% across our customer base, what’s the expected CLV impact? Second, at what adoption level does it make financial sense to upgrade Basic customers to Pro? These aren’t hypothetical, companies make million-dollar decisions based on these calculations.
Quantifying feature adoption ROI
Let’s start with marginal effects. In ecology, we use these constantly to understand how species respond to environmental changes. Here, we’re asking: “what’s the derivative of CLV with respect to feature adoption?” The marginaleffects
package makes this straightforward, and unlike taking derivatives by hand, it properly accounts for all the smooth functions and their interactions.
# Calculate marginal effects - essentially taking derivatives # of our smooth functions while accounting for uncertainty adoption_effects <- marginaleffects::avg_slopes( clv_tweedie, variables = "feature_adoption_pct", newdata = train_data ) # Population-averaged marginal effect (across all customers) adoption_effects ## ## Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 % ## 13.8 3.27 4.23 <0.001 15.4 7.42 20.2 ## ## Term: feature_adoption_pct ## Type: response ## Comparison: dY/dX # Calculate marginal effects by tier for more granular insights tier_effects <- marginaleffects::avg_slopes( clv_tweedie, variables = "feature_adoption_pct", by = "contract_tier", newdata = train_data ) # Show tier-specific ROI differences tier_effects %>% dplyr::mutate( # Calculate 10% improvement impact with confidence intervals impact_10pct = estimate * 10, lower_10pct = (estimate - 1.96 * std.error) * 10, upper_10pct = (estimate + 1.96 * std.error) * 10 ) %>% ggplot(aes(x = contract_tier, y = impact_10pct, color = contract_tier)) + # Add confidence intervals as error bars geom_errorbar(aes(ymin = lower_10pct, ymax = upper_10pct), width = 0.2, linewidth = 1) + # Add point estimates geom_point(size = 4) + # Format y-axis as currency scale_y_continuous(labels = scales::dollar) + # Use consistent color scheme scale_color_manual(values = c("darkblue", "darkred", "black")) + # Add informative labels labs(x = "Contract Tier", y = "CLV Impact of 10% Adoption Improvement") + theme(legend.position = "none")

Identifying upgrade thresholds
Here’s a more complex question that showcases GAM flexibility: when should you upgrade Basic customers to Pro? This isn’t about forcing upgrades (that rarely works), but rather identifying customers who would benefit from Pro features and justify the investment.
The list prices are Basic ($99/month) and Pro ($299/month), a $200 difference. However, the real cost of upgrading a customer might be higher than this sticker price difference. You need to factor in customer success time for onboarding Pro features, potential discounts you’ll offer to ease the transition, and implementation support. Let’s suppose this upgrade pushes the real incremental cost to around $300/month. Over our 6-month CLV horizon, that’s an $1,800 investment. The question becomes: at what adoption level does the incremental CLV exceed this cost?
# Build a prediction grid to explore the full adoption range adoption_range <- seq(15, 95, by = 5) # Create prediction grid for both tiers prediction_grid <- expand.grid( feature_adoption_pct = adoption_range, contract_tier = c("Basic", "Pro"), month = 3, acquisition_channel = "Organic" ) %>% dplyr::mutate( predicted_clv = predict(clv_tweedie, newdata = ., type = "response") ) # Reshape to have Basic and Pro CLV in separate columns threshold_analysis <- prediction_grid %>% tidyr::pivot_wider( names_from = contract_tier, names_prefix = "clv_", values_from = predicted_clv ) %>% # Rename columns to lowercase for consistency dplyr::rename(clv_basic = clv_Basic, clv_pro = clv_Pro) %>% dplyr::mutate( marginal_benefit = clv_pro - clv_basic, # Upgrade cost: $300/month price difference * 6 months = $1,800 total upgrade_cost = 300 * 6, net_benefit = marginal_benefit - upgrade_cost, roi_ratio = marginal_benefit / upgrade_cost ) # Visualize the upgrade decision as cost-benefit analysis ggplot(threshold_analysis, aes(x = feature_adoption_pct)) + # Show marginal benefit of upgrading geom_line(aes(y = marginal_benefit), color = "darkred", linewidth = 1.2) + # Show upgrade cost geom_hline(yintercept = 1800, color = "black", linewidth = 1.2, linetype = "dashed") + # Add shaded region where upgrades are profitable geom_ribbon(data = threshold_analysis %>% dplyr::filter(marginal_benefit > upgrade_cost), aes(ymin = upgrade_cost, ymax = marginal_benefit), fill = "darkred", alpha = 0.2) + # Add annotations using actual data values for positioning annotate("text", x = 45, y = 1830, label = "Upgrade Cost ($1,800)", color = "black", size = 5, hjust = 1) + annotate("text", x = 80, y = 1900, label = "Marginal Benefit\n(Pro CLV - Basic CLV)", color = "darkred", size = 5, hjust = 0.5) + # Format axes for currency and percentages scale_y_continuous(labels = scales::dollar) + scale_x_continuous(labels = scales::percent_format(scale = 1)) + # Add informative labels labs(x = "Feature Adoption Rate", y = "Incremental Value")

The model identifies a clear threshold at 55% adoption. Below that, you’re losing money on upgrades (spending $1,800 to gain $1,400-1,700 in CLV). Above 55%, the marginal benefits exceed the costs and keep growing. By 75% adoption, you’re looking at $2,000+ gains. This transforms a fuzzy “high engagement” heuristic into a concrete decision rule: monitor Basic customers, and when they cross 55% adoption, that’s your signal to discuss Pro features.
Scenario planning: Where to invest development resources
We can also perform detailed scenario planning using our GAM. Our results suggest that Basic customers hit complexity barriers that prevent them from extracting value. So let’s model a scenario: what if we invested in simplification features that help Basic customers get more value from the product?
# Extract just Basic tier customers for our scenario basic_customers <- train_data %>% dplyr::filter(contract_tier == "Basic") # Scenario: New features double the adoption impact for Basic customers # This represents UI simplification, better onboarding, or automation scenario_data <- basic_customers %>% dplyr::mutate( # Create a synthetic "Pro-like" response to adoption for Basic tier # This models features that help Basic users extract more value original_tier = contract_tier, contract_tier = factor("Pro", levels = c("Basic", "Pro", "Enterprise")) ) # Generate predictions with uncertainty baseline_posterior <- posterior_samples( clv_tweedie, n = 500, data = basic_customers, seed = 42, unconditional = TRUE, mvn_method = "mgcv" ) # Predict with improved value extraction scenario_posterior <- posterior_samples( clv_tweedie, n = 500, data = scenario_data, seed = 42, unconditional = TRUE, mvn_method = "mgcv" ) # Calculate per-customer impact to inform investment decisions impact_per_customer <- dplyr::tibble( baseline = baseline_posterior %>% dplyr::group_by(.row) %>% dplyr::summarise(base_clv = mean(.response), .groups = "drop") %>% dplyr::pull(base_clv), scenario = scenario_posterior %>% dplyr::group_by(.row) %>% dplyr::summarise(scen_clv = mean(.response), .groups = "drop") %>% dplyr::pull(scen_clv) ) %>% dplyr::mutate( uplift = scenario - baseline, roi_multiplier = scenario / baseline ) # Aggregate results investment_impact <- dplyr::tibble( metric = c("Mean CLV Uplift per Customer", "95% CI Lower", "95% CI Upper", "Mean ROI Multiple"), value = c(mean(impact_per_customer$uplift), quantile(impact_per_customer$uplift, 0.025), quantile(impact_per_customer$uplift, 0.975), mean(impact_per_customer$roi_multiplier)) ) investment_impact ## # A tibble: 4 × 2 ## metric value ## <chr> <dbl> ## 1 Mean CLV Uplift per Customer 1781. ## 2 95% CI Lower 1329. ## 3 95% CI Upper 2127. ## 4 Mean ROI Multiple 3.52 # Visualize investment impact distribution ggplot(impact_per_customer, aes(x = uplift)) + geom_histogram(bins = 25, fill = "darkred", alpha = 0.5, color = "white") + # Format x-axis as currency scale_x_continuous(labels = scales::dollar) + # Add informative labels labs(x = "CLV Uplift per Basic Customer", y = "Frequency" )

The results could make a case for simplification. If we can help Basic customers extract Pro-like value from their current adoption levels, we’re looking at $1781 uplift per customer. With typical feature development costs around $500 per customer for this scale of work, that’s a 3.6x return. Strategies to do this would depend on the context, but if we could help a Basic customer currently stuck at 40% adoption to reach 70% adoption (maybe with with better UX design), this would be one way of improving CLV by building customer loyalty.
Taking it further
If you want to push these models even further, here are some directions to explore:
- mvgam: My R package for Bayesian Dynamic GAMs handles temporal dependencies and multivariate outcomes. Perfect if you’re modeling customer cohorts with correlated behaviors or want proper time series features
- Hierarchical structures: The
gamm4
andbrms
packages let you build models with customer-level random effects, useful when you have repeat observations - Zero-inflation: If you have many customers who never generate value, check out the
ziP()
orziplss()
families inmgcv
- Time-varying effects: Use tensor products
te()
to let feature importance change over the customer lifecycle
Wrapping up
We’ve covered a lot of ground here, from understanding why SaaS CLV presents unique modeling challenges to extracting specific business insights from smooth functions. The key advantages of using GAMs for CLV prediction:
- Nonlinear relationships emerge naturally - We discovered Enterprise customers plateau hard while Basic customers grow linearly, without any manual feature engineering
- Appropriate uncertainty quantification - The Tweedie distribution gives us honest prediction intervals that grow with customer value, not the false precision of constant variance
- Actionable thresholds - We identified the exact 55% adoption level where Basic-to-Pro upgrades become profitable, transforming vague heuristics into decision rules
- Scenario modeling - We can simulate interventions and get uncertainty bounds on their impact, critical for investment decisions
What strikes me most is how similar these business problems are to ecological modeling challenges. In both domains, we need flexible models that respect natural constraints, provide interpretable outputs, and quantify uncertainty honestly. GAMs deliver on all fronts.
Try this approach on your own customer data. I’d be curious to hear what nonlinear patterns and thresholds you discover. The code above should translate directly, though you’ll need to adjust the variable names and tier structures to match your business model.
Further reading
The following papers and resources offer useful material about GAMs for business applications and customer analytics
Basu, A. (2023). Marketing analytics: the bridge between customer psychology and marketing decision‐making. Psychology & Marketing, 40(12), 2509-2528.
Glady, N., Baesens, B., & Croux, C. (2009). Modeling churn using customer lifetime value. European Journal of Operational Research, 197(1), 402-411.
Pedersen, E. J., Miller, D. L., Simpson, G. L., & Ross, N. (2019). Hierarchical generalized additive models in ecology: an introduction with mgcv. PeerJ, 7, e6876.
Theodorakopoulos, L., Kotsiantis, S., & Koumanakos, E. (2024). Leveraging big data analytics for understanding consumer behavior in digital marketing: a systematic review. Human Behavior and Emerging Technologies, 2024, 3641502.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.