Site icon R-bloggers

A Multi-Agent DDQN Strategic Audit Engine for Silver Markets using Keras/Tensorflow

[This article was first published on DataGeeek, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

1. Introduction & Theoretical Framework

In modern electronic trading markets, algorithmic execution engines drive the vast majority of institutional order flows. Evaluating whether these independent, learning-driven trading algorithms behave competitively or tacitly coordinate has become a critical challenge for quantitative compliance, market microstructure design, and risk management.

This technical article implements an automated Strategic Audit Engine designed to evaluate algorithmic execution regimes in the Silver futures market (SI=F). Our framework is explicitly built upon the empirical and theoretical foundations laid out by Koulouris & Campajola (2026) in their groundbreaking paper, “Memory-Induced Supra-Competitive Outcomes Between Deep Reinforcement Learning Agents in Optimal Trade Execution” (arXiv:2605.20348v1, May 2026).

The Core Thesis: Supra-Competitive Outcomes via Memory Paths

Traditional regulatory frameworks look for explicit collusion (active communication or cartel setups). However, Koulouris & Campajola demonstrate a far more subtle phenomenon: when independent Deep Reinforcement Learning (DRL) agents are equipped with memory—meaning they learn from rolling windows of historical price trajectories—they naturally converge toward supra-competitive outcomes. These are states where joint rewards remain artificially high, or execution parameters naturally align to mimic cooperation, without any explicit information exchange.

To audit this behavior empirically, our engine models a symmetric duopoly market interaction. It maps the actual market execution path against two fundamental game-theoretic baselines:

2. Technical Stack & Environmental Setup

To build a production-grade, reproducible multi-agent simulation pipeline, we leverage a hybrid data-science and deep-learning toolkit within the R ecosystem:

# 1. ENVIRONMENT SETUP
if (!require("pacman")) install.packages("pacman")
pacman::p_load(tidyquant, tidyverse, ggtext, glue, keras, tensorflow)

3. Building the Double Deep Q-Network Topology

Following the paper’s thesis on symmetric duopoly interactions, we construct two structurally identical execution agents: agent_A and agent_B. Both utilize a Dense Neural Network (Multilayer Perceptron) architecture to approximate the action-value space, denoted as Q(s, a).

The state space contains 3 features: Price Deviation, Asset Volatility (sigma), and Relative Time Horizon. The output layer projects to 3 discrete strategic action coordinates via a linear activation function.

# 2. SYMMETRIC AGENT ARCHITECTURE
build_strategic_agent <- function(state_size = 3, action_size = 3) {
  model <- keras_model_sequential() %>%
    layer_dense(units = 32, activation = "relu", input_shape = c(state_size)) %>%
    layer_dense(units = 32, activation = "relu") %>%
    layer_dense(units = action_size, activation = "linear")
  
  model %>% compile(
    optimizer = optimizer_adam(learning_rate = 0.001),
    loss = "mse"
  )
  return(model)
}

# Initialize the competing agents
agent_A <- build_strategic_agent()
agent_B <- build_strategic_agent()

4. Parameterization & Historical Replay Buffer Ingestion

To anchor our agents in empirical reality, we pull 2 years of continuous daily settlement prices for Silver futures (SI=F). We define our microstructural bounds—such as the risk aversion parameter (gamma) and the permanent market impact vector (eta)—alongside a fixed strategic execution memory window (T = 10).

# 3. STRATEGIC PARAMETERS
T_horizon <- 10      # Strategic episode length (Memory window)
gamma_param <- 0.0001 # Risk aversion
eta_param <- 0.0005   # Market impact

# 4. HISTORICAL REPLAY DATA (2-Year Training Set)
silver_full <- tq_get("SI=F", from = Sys.Date() - 730) %>%
  filter(!is.na(close)) %>%
  mutate(returns = close / lag(close) - 1) %>%
  drop_na()

# Recent window for the final audit visualization
silver_recent <- tail(silver_full, T_horizon)

5. Dynamic Volatility Corridors

Rather than mapping market behavior against static thresholds, the audit engine computes a volatility-adaptive safety corridor. The boundaries dynamically expand and contract based on the asset’s realized standard deviation (sigma), isolating pure structural noise from intentional strategic maneuvers.

# 5. DYNAMIC SIGMA CORRIDORS
current_sigma <- sd(silver_recent$returns, na.rm = TRUE)
if(is.na(current_sigma)) current_sigma <- 0.01 

analysis_data <- silver_recent %>%
  mutate(
    twap_slope = current_sigma * 1.5, 
    nash_slope = current_sigma * 4.0,
    twap_path = first(close) * (1 - seq(0, first(twap_slope), length.out = n())),
    nash_path = first(close) * (1 - seq(0, first(nash_slope), length.out = n())),
    lower_safety_limit = nash_path * (1 - current_sigma)
  )

6. The Joint Training Replay Engine & Payoff Matrix

This section represents the computational implementation of Koulouris & Campajola’s memory hypothesis. The two agents recursively traverse 2 years of rolling historical windows (window_data).

At each node, they sample independent actions based on their weights, facing a non-cooperative game matrix:

# 6. JOINT TRAINING ENGINE (Symmetric Memory Interaction)
message("Joint Training: Agent A & Agent B are learning Silver Market dynamics...")

for(i in 1:(nrow(silver_full) - T_horizon)) {
  window_data <- silver_full[i:(i + T_horizon - 1), ]
  vol <- sd(window_data$returns, na.rm = TRUE)
  if(is.na(vol)) vol <- 0.01
  
  state_vec <- matrix(c(1.0, vol, 0.5), nrow = 1)
  
  act_A <- which.max(predict(agent_A, state_vec, verbose = 0)) - 1
  act_B <- which.max(predict(agent_B, state_vec, verbose = 0)) - 1
  
  rewards <- if(act_A == 0 && act_B == 0) {
    list(A = 10, B = 10) 
  } else if(act_A == act_B) {
    list(A = 1, B = 1)   
  } else {
    if(act_A > act_B) list(A = 5, B = -5) else list(A = -5, B = 5) 
  }
  
  target_A <- predict(agent_A, state_vec, verbose = 0)
  target_B <- predict(agent_B, state_vec, verbose = 0)
  
  target_A[1, act_A + 1] <- rewards$A
  target_B[1, act_B + 1] <- rewards$B
  
  agent_A %>% fit(state_vec, target_A, epochs = 1, verbose = 0)
  agent_B %>% fit(state_vec, target_B, epochs = 1, verbose = 0)
}

7. Post-Convergence Audit Inference & Regime Selection

Once the networks stabilize, the engine takes the posture of an unbiased financial regulator. It extracts the neural policy configurations, evaluates the actual current execution window, and automatically determines the market regime using an automated classification layer.

# 7. FINAL AUDIT INFERENCE
analysis_data <- analysis_data %>%
  rowwise() %>%
  mutate(
    state_v = list(matrix(c(close/twap_path, current_sigma, (T_horizon - row_number())/T_horizon), nrow = 1)),
    q_A = list(predict(agent_A, state_v[], verbose = 0)),
    q_B = list(predict(agent_B, state_v[], verbose = 0)),
    joint_action = (which.max(q_A[]) + which.max(q_B[])) / 2
  ) %>% ungroup()

# 8. STATUS LOGIC (Professional Category Selection & Color Alignment)
last_row <- tail(analysis_data, 1)
market_status <- case_when(
  last_row$close >= last_row$twap_path ~ 
    list(
      label = "**COOPERATIVE:** Pareto-Efficient Alignment", 
      bg    = "#E8F8F5",  
      color = "#27AE60"   
    ),
  
  last_row$close < last_row$twap_path & last_row$close >= last_row$nash_path ~ 
    list(
      label = "**NORMAL:** Competitive Nash Equilibrium", 
      bg    = "#FEF5E7",  
      color = "#E67E22"   
    ),
  
  TRUE ~ 
    list(
      label = "**LIQUIDITY SHOCK:** Strategic Deviation Detected", 
      bg    = "#FDEDEC",  
      color = "#C0392B"   
    )
)

8. High-Fidelity Infographic Layer

To generate a publication-quality static vector infographic, we map our theme directly via ggplot2 and ggtext. By embedding the color palette directly into the HTML subtitle strings and forcing label formatting via scales::percent, we create a clean, high-contrast dashboard visualization.

# 9. GGPLOT PRODUCTION VISUALIZATION (Static Mode with ggtext Integration)
ggplot(analysis_data, aes(x = date)) +
  geom_ribbon(aes(ymin = lower_safety_limit, ymax = twap_path), fill = "darkgray", alpha = 0.3) +
  
  geom_line(aes(y = twap_path, color = "TWAP (Cooperative)"), size = 1) +
  geom_line(aes(y = nash_path, color = "Nash (Competitive)"), size = 1) +
  geom_line(aes(y = close, color = "Actual Price"), size = 1.3) +
  scale_y_continuous(labels = scales::label_currency()) +
  
  geom_richtext(
    aes(x = median(date), y = max(close, twap_path) * 1.02, label = market_status$label),
    fill = market_status$bg, color = market_status$color, size = 4,
    family = "Roboto Slab" 
  ) +
  
  scale_color_manual(
    name = NULL,
    values = c("Actual Price" = "steelblue", "TWAP (Cooperative)" = "#27AE60", "Nash (Competitive)" = "#E67E22")
  ) +
  
  labs(
    title = "Silver Market Strategic Audit Engine",
    subtitle = paste0(
      "<span style='color:#27AE60;'>─── **Cooperative Zone**</span> | ",
      "<span style='color:#E67E22;'>─── **Competitive Zone**</span> | ",
      "<span style='color:steelblue;'>─── **Actual Execution**</span><br><br>",
      "<span style='color:darkgrey;'>**Strategic Corridor** (Supra-Competitive Margin Zone)</span>"
    ),
    x = NULL, y = NULL,
    caption = glue("Dynamic Sigma: {scales::percent(current_sigma, accuracy = 0.01)} | Shortfall: {round(actual_cost, 2)}%")
  ) +
  
  theme_minimal(base_family = "Roboto Slab") +
  theme(plot.title = element_text(face = "bold", size = 16),
        plot.subtitle = element_markdown(face = "bold"), 
        axis.text = element_text(face = "bold"),
        legend.position = "none")

9. Empirics & Compliance Conclusion

When we run the complete inference loop on our terminal Silver execution window, the strategic narrative clarifies perfectly: Actual Execution (the blue trajectory) tracks downward, bypassing the cooperative upper envelope and adhering directly to the competitive boundaries.

The audit badge cleanly returns a status of NORMAL: Competitive Nash Equilibrium, with the terminal metrics computing the exact execution shortfall at 1.59% as indicated in the chart above. While the agents are technically complex neural networks capable of learning memory patterns, the actual price action during this specific ten-day horizon reflects a highly competitive regime, keeping the execution within standard Nash boundaries rather than shifting into a supra-competitive zone.

For quantitative auditors and systemic risk monitors, this approach signals a paradigm shift. Static threshold tests are blind to multi-agent learning trends. By deploying neural simulation baselines, structural compliance teams can automatically audit execution algorithms, isolating algorithmic alignment from pure market variance.

To leave a comment for the author, please follow the link and comment on their blog: DataGeeek.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Exit mobile version