Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
< bslib-tooltip placement="auto" bsoptions="[]" data-require-bs-version="5" data-require-bs-caller="tooltip()">
< template>A Centralized Metadata Object Focus on Clinical Trial Data Programming Workflows • metacore {metacore} and < bslib-tooltip placement="auto" bsoptions="[]" data-require-bs-version="5" data-require-bs-caller="tooltip()"> < template>GitHub – pharmaverse/metatools {metatools} have a new package maintainer
Hi, everyone! I’m Liam and I’m excited to announce that I have taken over as package maintainer for both < bslib-tooltip placement="auto" bsoptions="[]" data-require-bs-version="5" data-require-bs-caller="tooltip()"> < template>A Centralized Metadata Object Focus on Clinical Trial Data Programming Workflows • metacore {metacore} and < bslib-tooltip placement="auto" bsoptions="[]" data-require-bs-version="5" data-require-bs-caller="tooltip()"> < template>GitHub – pharmaverse/metatools {metatools} from Christina Fillmore. I work at GSK as a clinical programmer and I am coming to the end of my second year in the industry. This is my first experience working within the open-source world, but I am a regular user of pharmaverse packages and am keen to get more involved with the community.
Christina remains on-hand as a mentor and I’d like to thank both her and Ben Straub for the continued support before we dive into the details of < bslib-tooltip placement="auto" bsoptions="[]" data-require-bs-version="5" data-require-bs-caller="tooltip()"> < template>A Centralized Metadata Object Focus on Clinical Trial Data Programming Workflows • metacore {metacore} /< bslib-tooltip placement="auto" bsoptions="[]" data-require-bs-version="5" data-require-bs-caller="tooltip()"> < template>GitHub – pharmaverse/metatools {metatools} 0.2.0.
< section id="whats-new-in-metacore" class="level2">What’s new in metacore?
The goal of version 0.2.0 was to clarify the distinction between an imported Metacore spec, containing information about multiple datasets, and a subsetted spec containing information about just a single dataset (as achieved via < bslib-tooltip placement="auto" bsoptions="[]" data-require-bs-version="5" data-require-bs-caller="tooltip()"> < template>Select metacore object to single dataset — select_dataset • metacore metacore::select_dataset() ).
We received a number of questions and issues raised where users were attempting to use a Metacore object containing metadata for multiple datasets in functions from < bslib-tooltip placement="auto" bsoptions="[]" data-require-bs-version="5" data-require-bs-caller="tooltip()"> < template>GitHub – pharmaverse/metatools {metatools} that were designed to take a single, subsetted specification. When developing datasets, the typical workflow is to be working on a single dataset at a time – so subsetting the Metacore object is the logical thing to do. The issue was that the approach to functions in < bslib-tooltip placement="auto" bsoptions="[]" data-require-bs-version="5" data-require-bs-caller="tooltip()"> < template>GitHub – pharmaverse/metatools {metatools} was inconsistent, with some functions permitting multiple specification metadata and others not.
Now, a Metacore object which has multiple datasets or one with a single dataset have been redesigned to be programmatically distinct, with the single dataset implemented as a subclass of Metacore called “DatasetMeta”.
From the users’ perspective there is one key change. A metadata object about a single dataset will be required for users to work with < bslib-tooltip placement="auto" bsoptions="[]" data-require-bs-version="5" data-require-bs-caller="tooltip()"> < template>GitHub – pharmaverse/metatools {metatools} functions, which have had their API harmonised to accept only subsetted Metacore objects (via < bslib-tooltip placement="auto" bsoptions="[]" data-require-bs-version="5" data-require-bs-caller="tooltip()"> < template>Select metacore object to single dataset — select_dataset • metacore metacore::select_dataset() ).
The print statements of both combined and subsetted Metacore objects have been refined to better illustrate the differences between them and provide more helpful information to the user.
library(dplyr)
Attaching package: 'dplyr'
The following objects are masked from 'package:stats': filter, lag
The following objects are masked from 'package:base': intersect, setdiff, setequal, union
library(metacore) library(metatools) library(tibble) library(haven) load(metacore_example("pilot_ADaM.rda")) metacore
── Metacore object contains metadata for 5 datasets ────────────────────────────
→ ADSL (Subject-Level Analysis Dataset)
→ ADADAS (ADAS-Cog Analysis)
→ ADLBC (Analysis Dataset Lab Blood Chemistry)
→ ADTTE (AE Time To 1st Derm. Event Analysis)
→ ADAE (Adverse Events Analysis Dataset)
ℹ To use the Metacore object with metatools package, first subset a dataset using `metacore::select_dataset()`
The < bslib-tooltip placement="auto" bsoptions="[]" data-require-bs-version="5" data-require-bs-caller="tooltip()"> < template>Select metacore object to single dataset — select_dataset • metacore metacore::select_dataset() function is now explicit about what is being selected:
adsl_spec <- select_dataset(metacore, "ADSL", quiet = TRUE)
✔ ADSL dataset successfully selected
ℹ Dataset metadata specification subsetted with suppressed warnings
Printing the subsetted object now provides more detailed information:
adsl_spec
── Dataset specification object for ADSL (Subject-Level Analysis Dataset) ──────
The dataset contains 51 variables
Dataset key: USUBJID
The structure of the specification object is:
→ codelist: character [16 x 4] code_id, name, type, codes
→ derivations: character [50 x 2] derivation_id derivation
→ ds_spec: character [1 x 3] dataset, structure, label
→ ds_vars: character [51 x 7] dataset, variable, key_seq, order, keep, core, supp_flag
→ supp: character [0 x 4] dataset, variable, idvar, qeval
→ value_spec: character [51 x 8] dataset, variable, code_id, derivation_id, type, origin, where, sig_dig
→ var_spec: character [51 x 6] variable, type, length, label, format, common
To inspect the specification object use `View()` in the console.
Functions that take a Metacore object as input will emit a helpful message if a subsetted object is not supplied.
ds_list <- list(DM = read_xpt(metatools_example("dm.xpt"))) create_var_from_codelist(data.frame(), metacore)
Error in `verify_DatasetMeta()`: ! The object supplied to the argument `metacore` is not a subsetted Metacore object. Use `metacore::select_dataset()` to subset metadata for the required dataset.
Related: soft deprecation of dataset_name
in metatools
Additionally, the argument dataset_name
has been soft-deprecated across all functions in < bslib-tooltip placement="auto" bsoptions="[]" data-require-bs-version="5" data-require-bs-caller="tooltip()"> < template>GitHub – pharmaverse/metatools {metatools} . While the argument is still available and will not break existing code, using it will now issue a warning. This change encourages users to adopt the preferred workflow, creating a subsetted Metacore object, and improves performance by avoiding repeated subsetting operations each time these functions are called.
The full list of affected functions is included below. The dataset_name
argument will remain available for at least one year from the release date of 0.2.0 before being fully removed.
build_from_derived
, check_variables
, check_unique_keys
, make_supp_qual
, drop_unspec_vars
, add_variables
, order_cols
, sort_by_key
.
create_var_from_codelist()
< bslib-tooltip placement="auto" bsoptions="[]" data-require-bs-version="5" data-require-bs-caller="tooltip()"> < template>Create Variable from Codelist — create_var_from_codelist • metatools metatools::create_var_from_codelist() now optionally allows the user to specify a codelist from which the new column should be generated. This is useful in situations like the one below where the user is trying to derive PARAM
from PARAMCD
but the codelist for the out_var
(PARAM
) does not contain the values of PARAMCD
.
ID | Order | Code | Decode |
---|---|---|---|
PARAM | 1 | Alanine Aminotransferase | Alanine Aminotransferase |
PARAM | 2 | Bilirubin | Bilirubin |
PARAM | 3 | Creatine | Creatine |
Example of default usage not providing the correct result:
adlbc_spec <- suppressMessages(select_dataset(metacore, "ADLBC", quiet = TRUE)) data <- tibble(PARAMCD = c("ALB", "ALP", "ALT")) create_var_from_codelist(data, adlbc_spec, input_var = PARAMCD, out_var = PARAM, strict = FALSE)
# A tibble: 3 × 2 PARAMCD PARAM <chr> <chr> 1 ALB <NA> 2 ALP <NA> 3 ALT <NA>
By default, < bslib-tooltip placement="auto" bsoptions="[]" data-require-bs-version="5" data-require-bs-caller="tooltip()"> < template>Create Variable from Codelist — create_var_from_codelist • metatools metatools::create_var_from_codelist() takes the codelist of the out_var
as input. The user can now overwrite this default with a specific codelist (in this case PARAMCD
below) to achieve the desired result.
ID | Order | Code | Decode |
---|---|---|---|
PARAMCD | 1 | ALT | Alanine Aminotransferase |
PARAMCD | 2 | BILI | Bilirubin |
PARAMCD | 3 | CREAT | Creatine |
create_var_from_codelist(data, adlbc_spec, input_var = PARAMCD, out_var = PARAM, codelist = get_control_term(adlbc_spec, PARAMCD), decode_to_code = FALSE)
# A tibble: 3 × 2 PARAMCD PARAM <chr> <chr> 1 ALB Albumin (g/L) 2 ALP Alkaline Phosphatase (U/L) 3 ALT Alanine Aminotransferase (U/L)
This function also provides a new option strict
, which when set to TRUE
(default) will issue a warning indicating any values in your input column that do not appear in the codelist.
data <- tibble(PARAMCD = c("ALB", "ALP", "ALT", "DUMMY1", "DUMMY2")) x <- create_var_from_codelist(data, adlbc_spec, input_var = PARAMCD, out_var = PARAM, codelist = get_control_term(adlbc_spec, PARAMCD), decode_to_code = FALSE, strict = TRUE)
Warning: In `create_var_from_codelist()`: The following values present in the input dataset are not present in the codelist: DUMMY1 and DUMMY2
create_cat_var()
< bslib-tooltip placement="auto" bsoptions="[]" data-require-bs-version="5" data-require-bs-caller="tooltip()"> < template>Create Categorical Variable from Codelist — create_cat_var • metatools metatools::create_cat_var() has been updated so that users can now specify to create a new variable from either the code
or decode
column of the controlled terminology. Previously, a codelist set-up like the one below would be evaluated from the code
column only, leaving out the “years” text from the new variable.
ID | Name | Data Type | Order | Code | Decode |
---|---|---|---|---|---|
AGEGR2 | Pooled Age Group 2 | text | 1 | <35 | <35 years |
AGEGR2 | Pooled Age Group 2 | text | 2 | 35-49 | 35-49 years |
AGEGR2 | Pooled Age Group 2 | text | 3 | >= 50 | >= 50 years |
Now, specifying the option create_from_decode = TRUE
will allow you to create the variable based on the text in the decode
column. If you are using this option to also create a numeric coded variable (in this case AGEGR2N
), ensure your CT is set up so that the decode
columns match.
dm <- read_xpt(metatools_example("dm.xpt")) create_cat_var(dm, adsl_spec, AGE, AGEGR2, AGEGR2N, create_from_decode = TRUE) %>% select(USUBJID, AGE, AGEGR2, AGEGR2N) %>% head(5)
# A tibble: 5 × 4 USUBJID AGE AGEGR2 AGEGR2N <chr> <dbl> <chr> <dbl> 1 01-701-1015 63 18-64 years 1 2 01-701-1023 64 18-64 years 1 3 01-701-1028 71 65-80 years 2 4 01-701-1033 74 65-80 years 2 5 01-701-1034 77 65-80 years 2
This function now also provides a default strict = TRUE
option, that issues a warning message if there are values in the reference column that do not fit into the categories in the controlled terminology. This can be disabled with strict = FALSE
.
dm2 <- dm |> tibble::add_row(AGE = 15) |> tibble::add_row(AGE = 16) x <- create_cat_var(dm2, adsl_spec, AGE, AGEGR2, create_from_decode = TRUE)
Warning: There are 2 observations in AGE that do not fit into the provided categories for AGEGR2. Please check your controlled terminology.
Summary of Other Changes
Fixed a bug where the presence of variables with VLM in the
value_spec
table would prevent variables of the same name in different datasets being populated in thevalue_spec
table.< bslib-tooltip placement="auto" bsoptions="[]" data-require-bs-version="5" data-require-bs-caller="tooltip()"> < template>Build a dataset from derived — build_from_derived • metatools metatools::build_from_derived() adds new options for the
keep
parameter that allow users to derive eitherall
or onlyprerequisite
columns from source datasets. Thanks to Matt Bearham for this amendment!< bslib-tooltip placement="auto" bsoptions="[]" data-require-bs-version="5" data-require-bs-caller="tooltip()"> < template>Combine the Domain and Supplemental Qualifier — combine_supp • metatools metatools::combine_supp() now adds the label found in
QLABEL
to theQNAM
columns that are derived from supplementary datasets. Thanks to Bill Denney for this amendment!< bslib-tooltip placement="auto" bsoptions="[]" data-require-bs-version="5" data-require-bs-caller="tooltip()"> < template>Check Variable Names — check_variables • metatools metatools::check_variables() now provides a
strict
option that will issue a warning rather than throw an error whenstrict = FALSE
.< bslib-tooltip placement="auto" bsoptions="[]" data-require-bs-version="5" data-require-bs-caller="tooltip()"> < template>Helpers for Developing Command Line Interfaces • cli {cli} output is now used across both packages and messaging for various functions has been improved.
What’s next?
The next step for both packages will be working through and closing out issues from the backlog, updating the examples and vignettes, and improving the user experience via more informative messaging.
For < bslib-tooltip placement="auto" bsoptions="[]" data-require-bs-version="5" data-require-bs-caller="tooltip()"> < template>A Centralized Metadata Object Focus on Clinical Trial Data Programming Workflows • metacore {metacore} , there has been some interest in a UI to help users write custom specification readers for specs not in the standard P21 format. So this will be explored as well.
I hope to release the next update towards the end of the year, looking at an approximately 6-monthly release schedule going forward. Until then I encourage people to explore some of the new features and provide feedback on the changes through GitHub at the links below:
Thanks for reading!
< !--------------- appendices go here ----------------->Last updated
2025-08-07 01:11:34.287895
Details
< section class="quarto-appendix-contents" id="quarto-reuse">Reuse
< section class="quarto-appendix-contents" id="quarto-citation">Citation
@online{hobby2025, author = {Hobby, Liam}, title = {Metacore and {Metatools} 0.2.0}, date = {2025-08-04}, url = {https://pharmaverse.github.io/blog/posts/2025-08-04_metacore_0.2.0/metacore_0.2.0.html}, langid = {en} }
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.