Why we still need {admiral} in an age of AI

Jeff Dickinson

23 hours ago

[This article was first published on pharmaverse blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

< !--------------- typical setup -----------------> < !--------------- post begins here ----------------->

There is a version of the AI-in-pharma story that goes like this: LLMs are trained on vast amounts of R code, so they can write ADaM programs on demand. Packages like < bslib-tooltip placement="auto" bsoptions="[]" data-require-bs-version="5" data-require-bs-caller="tooltip()"> < template>admiral {admiral} become optional — a style preference rather than a requirement. Just describe what you need and let the model figure it out.

The benchmark data from pharma-skills tells a different story.

< section id="what-an-unskilled-agent-actually-does" class="level2">

What an Unskilled Agent Actually Does

When an AI coding agent is asked to derive an ADAE dataset without access to < bslib-tooltip placement="auto" bsoptions="[]" data-require-bs-version="5" data-require-bs-caller="tooltip()"> < template>admiral {admiral} skill guidance, it does not reach for derive_var_trtemfl() or derive_vars_merged() with the correct parameters. Across multiple benchmark runs, unskilled agents fell into two consistent failure modes: either generating synthetic data rather than using pharmaverse reference datasets, or writing bespoke dplyr pipelines that reimplemented logic < bslib-tooltip placement="auto" bsoptions="[]" data-require-bs-version="5" data-require-bs-caller="tooltip()"> < template>admiral {admiral} already provides — incorrectly.

One example from BDS benchmarking is particularly telling. Without skill guidance, agents consistently used derive_vars_merged() where derive_vars_merged_lookup() was required for parameter code assignment. Both functions exist in < bslib-tooltip placement="auto" bsoptions="[]" data-require-bs-version="5" data-require-bs-caller="tooltip()"> < template>admiral {admiral} . Both execute without error. But derive_vars_merged() drops unmatched records silently, producing a dataset with the wrong row count. No warning. No crash. Just wrong output that passes a casual review.

This is not a model quality problem. It is a knowledge problem. The model does not know what the pharmaverse community knows.

< section id="the-package-as-specification" class="level2">

The Package as Specification

< bslib-tooltip placement="auto" bsoptions="[]" data-require-bs-version="5" data-require-bs-caller="tooltip()"> < template>admiral {admiral} is more than a collection of R functions. It is a community-maintained encoding of CDISC ADaM logic — accumulated through years of collaboration across sponsors, CROs, and regulators, tested against real submissions, and versioned for traceability. When a programmer calls derive_vars_dtm() with the correct imputation flags, they are not just writing R code. They are implementing a specification that has been reviewed, validated, and documented.

An LLM trained on general R code does not reliably inherit that specification. It has seen < bslib-tooltip placement="auto" bsoptions="[]" data-require-bs-version="5" data-require-bs-caller="tooltip()"> < template>admiral {admiral} in its training data, but not with the depth or precision needed to apply it correctly across the full range of ADaM derivation scenarios. The unskilled agent that wrote a custom parse_dtc_datetime() function using substr() and as.POSIXct() — rather than calling derive_vars_dtm() — was not being lazy. It was doing its best with what it knew. Its best was not good enough, and the errors it introduced were in the edge cases that matter most in a clinical submission.

< section id="what-the-skill-does" class="level2">

What the Skill Does

The < bslib-tooltip placement="auto" bsoptions="[]" data-require-bs-version="5" data-require-bs-caller="tooltip()"> < template>admiral {admiral} skills in pharma-skills do not replace < bslib-tooltip placement="auto" bsoptions="[]" data-require-bs-version="5" data-require-bs-caller="tooltip()"> < template>admiral {admiral} . They connect the AI agent to it. A skill provides curated, domain-aware guidance: which functions to use for which derivations, how to structure the program for QC readability, which variables require special handling, and what assertions to include. The skill is the bridge between a capable general-purpose model and the specific, validated logic the pharmaverse community has built.

The benchmark results reflect this directly. Across ADSL, ADAE, ADVS, and ADLB:

With skill: 88–100% pass rates across domains
Without skill: 17–59% pass rates, with high variance

That variance in the unskilled condition matters as much as the mean. Inconsistent output is not a defensible process in a GxP context. A skill-guided agent produces consistent, traceable, < bslib-tooltip placement="auto" bsoptions="[]" data-require-bs-version="5" data-require-bs-caller="tooltip()"> < template>admiral {admiral} -anchored code. An unskilled agent produces something different every time.

< section id="the-accountability-anchor" class="level2">

The Accountability Anchor

There is a regulatory dimension here that goes beyond code quality. A clinical submission needs to trace its derivations to validated, versioned, documented methods. A bespoke LLM-generated pipeline — however functional — has no such anchor. < bslib-tooltip placement="auto" bsoptions="[]" data-require-bs-version="5" data-require-bs-caller="tooltip()"> < template>admiral {admiral} provides it. When a submission uses derive_var_trtemfl() from a pinned version of < bslib-tooltip placement="auto" bsoptions="[]" data-require-bs-version="5" data-require-bs-caller="tooltip()"> < template>admiral {admiral} , the derivation logic is documented, community-reviewed, and reproducible. The AI is most useful when it is writing code that inherits those properties, not when it is improvising around them.

This is why the pharma-skills project frames skills not as prompt templates, but as domain knowledge artifacts. The goal is not to make AI write more R code. It is to make AI write < bslib-tooltip placement="auto" bsoptions="[]" data-require-bs-version="5" data-require-bs-caller="tooltip()"> < template>admiral {admiral} code — correctly, consistently, and in a form that a human reviewer can audit and a regulatory submission can defend.

< bslib-tooltip placement="auto" bsoptions="[]" data-require-bs-version="5" data-require-bs-caller="tooltip()"> < template>admiral {admiral} was built for exactly this moment. The community just needs to make sure AI knows how to use it.

< !--------------- appendices go here ----------------->

< section id="last-updated" class="level2 appendix">

Last updated

2026-06-14 18:52:36.863336

< section id="details" class="level2 appendix">

Details

Source, Session info

< section class="quarto-appendix-contents" id="quarto-reuse">

Reuse

CC BY 4.0

< section class="quarto-appendix-contents" id="quarto-citation">

Citation

BibTeX citation:

@online{dickinson2026,
  author = {Dickinson, Jeff},
  title = {Why We Still Need \{Admiral\} in an Age of {AI}},
  date = {2026-06-14},
  url = {https://pharmaverse.github.io/blog/posts/2026-06-14_admiral_in_age_of_ai/admiral_in_age_of_ai.html},
  langid = {en}
}

For attribution, please cite this work as:

Dickinson, Jeff. 2026. “Why We Still Need {Admiral} in an Age of AI.” June 14, 2026. https://pharmaverse.github.io/blog/posts/2026-06-14_admiral_in_age_of_ai/admiral_in_age_of_ai.html.

To leave a comment for the author, please follow the link and comment on their blog: pharmaverse blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.