Why we still need {admiral} in an age of AI
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
There is a version of the AI-in-pharma story that goes like this: LLMs are trained on vast amounts of R code, so they can write ADaM programs on demand. Packages like
The benchmark data from pharma-skills tells a different story.
What an Unskilled Agent Actually Does
When an AI coding agent is asked to derive an ADAE dataset without access to derive_var_trtemfl() or derive_vars_merged() with the correct parameters. Across multiple benchmark runs, unskilled agents fell into two consistent failure modes: either generating synthetic data rather than using pharmaverse reference datasets, or writing bespoke dplyr pipelines that reimplemented logic
One example from BDS benchmarking is particularly telling. Without skill guidance, agents consistently used derive_vars_merged() where derive_vars_merged_lookup() was required for parameter code assignment. Both functions exist in derive_vars_merged() drops unmatched records silently, producing a dataset with the wrong row count. No warning. No crash. Just wrong output that passes a casual review.
This is not a model quality problem. It is a knowledge problem. The model does not know what the pharmaverse community knows.
The Package as Specification
derive_vars_dtm() with the correct imputation flags, they are not just writing R code. They are implementing a specification that has been reviewed, validated, and documented.
An LLM trained on general R code does not reliably inherit that specification. It has seen parse_dtc_datetime() function using substr() and as.POSIXct() — rather than calling derive_vars_dtm() — was not being lazy. It was doing its best with what it knew. Its best was not good enough, and the errors it introduced were in the edge cases that matter most in a clinical submission.
What the Skill Does
The
The benchmark results reflect this directly. Across ADSL, ADAE, ADVS, and ADLB:
- With skill: 88–100% pass rates across domains
- Without skill: 17–59% pass rates, with high variance
That variance in the unskilled condition matters as much as the mean. Inconsistent output is not a defensible process in a GxP context. A skill-guided agent produces consistent, traceable,
The Accountability Anchor
There is a regulatory dimension here that goes beyond code quality. A clinical submission needs to trace its derivations to validated, versioned, documented methods. A bespoke LLM-generated pipeline — however functional — has no such anchor. derive_var_trtemfl() from a pinned version of
This is why the pharma-skills project frames skills not as prompt templates, but as domain knowledge artifacts. The goal is not to make AI write more R code. It is to make AI write
Last updated
2026-06-14 18:52:36.863336
Details
Reuse
Citation
@online{dickinson2026,
author = {Dickinson, Jeff},
title = {Why We Still Need \{Admiral\} in an Age of {AI}},
date = {2026-06-14},
url = {https://pharmaverse.github.io/blog/posts/2026-06-14_admiral_in_age_of_ai/admiral_in_age_of_ai.html},
langid = {en}
}
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.