Site icon R-bloggers

Metacore and Metatools 0.2.0

[This article was first published on pharmaverse blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
< !--------------- typical setup -----------------> < !--------------- post begins here -----------------> < section id="section" class="level2">

< bslib-tooltip placement="auto" bsoptions="[]" data-require-bs-version="5" data-require-bs-caller="tooltip()">

< template>A Centralized Metadata Object Focus on Clinical Trial Data Programming Workflows • metacore {metacore} and < bslib-tooltip placement="auto" bsoptions="[]" data-require-bs-version="5" data-require-bs-caller="tooltip()"> < template>GitHub – pharmaverse/metatools {metatools} have a new package maintainer

Hi, everyone! I’m Liam and I’m excited to announce that I have taken over as package maintainer for both < bslib-tooltip placement="auto" bsoptions="[]" data-require-bs-version="5" data-require-bs-caller="tooltip()"> < template>A Centralized Metadata Object Focus on Clinical Trial Data Programming Workflows • metacore {metacore} and < bslib-tooltip placement="auto" bsoptions="[]" data-require-bs-version="5" data-require-bs-caller="tooltip()"> < template>GitHub – pharmaverse/metatools {metatools} from Christina Fillmore. I work at GSK as a clinical programmer and I am coming to the end of my second year in the industry. This is my first experience working within the open-source world, but I am a regular user of pharmaverse packages and am keen to get more involved with the community.

Christina remains on-hand as a mentor and I’d like to thank both her and Ben Straub for the continued support before we dive into the details of < bslib-tooltip placement="auto" bsoptions="[]" data-require-bs-version="5" data-require-bs-caller="tooltip()"> < template>A Centralized Metadata Object Focus on Clinical Trial Data Programming Workflows • metacore {metacore} /< bslib-tooltip placement="auto" bsoptions="[]" data-require-bs-version="5" data-require-bs-caller="tooltip()"> < template>GitHub – pharmaverse/metatools {metatools} 0.2.0.

< section id="whats-new-in-metacore" class="level2">

What’s new in metacore?

The goal of version 0.2.0 was to clarify the distinction between an imported Metacore spec, containing information about multiple datasets, and a subsetted spec containing information about just a single dataset (as achieved via < bslib-tooltip placement="auto" bsoptions="[]" data-require-bs-version="5" data-require-bs-caller="tooltip()"> < template>Select metacore object to single dataset — select_dataset • metacore metacore::select_dataset() ).

We received a number of questions and issues raised where users were attempting to use a Metacore object containing metadata for multiple datasets in functions from < bslib-tooltip placement="auto" bsoptions="[]" data-require-bs-version="5" data-require-bs-caller="tooltip()"> < template>GitHub – pharmaverse/metatools {metatools} that were designed to take a single, subsetted specification. When developing datasets, the typical workflow is to be working on a single dataset at a time – so subsetting the Metacore object is the logical thing to do. The issue was that the approach to functions in < bslib-tooltip placement="auto" bsoptions="[]" data-require-bs-version="5" data-require-bs-caller="tooltip()"> < template>GitHub – pharmaverse/metatools {metatools} was inconsistent, with some functions permitting multiple specification metadata and others not.

Now, a Metacore object which has multiple datasets or one with a single dataset have been redesigned to be programmatically distinct, with the single dataset implemented as a subclass of Metacore called “DatasetMeta”.

From the users’ perspective there is one key change. A metadata object about a single dataset will be required for users to work with < bslib-tooltip placement="auto" bsoptions="[]" data-require-bs-version="5" data-require-bs-caller="tooltip()"> < template>GitHub – pharmaverse/metatools {metatools} functions, which have had their API harmonised to accept only subsetted Metacore objects (via < bslib-tooltip placement="auto" bsoptions="[]" data-require-bs-version="5" data-require-bs-caller="tooltip()"> < template>Select metacore object to single dataset — select_dataset • metacore metacore::select_dataset() ).

The print statements of both combined and subsetted Metacore objects have been refined to better illustrate the differences between them and provide more helpful information to the user.

library(dplyr)
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
library(metacore)
library(metatools)
library(tibble)
library(haven)

load(metacore_example("pilot_ADaM.rda"))
metacore
── Metacore object contains metadata for 5 datasets ────────────────────────────
→ ADSL (Subject-Level Analysis Dataset)
→ ADADAS (ADAS-Cog Analysis)
→ ADLBC (Analysis Dataset Lab Blood Chemistry)
→ ADTTE (AE Time To 1st Derm. Event Analysis)
→ ADAE (Adverse Events Analysis Dataset)
ℹ To use the Metacore object with metatools package, first subset a dataset using `metacore::select_dataset()`

The < bslib-tooltip placement="auto" bsoptions="[]" data-require-bs-version="5" data-require-bs-caller="tooltip()"> < template>Select metacore object to single dataset — select_dataset • metacore metacore::select_dataset() function is now explicit about what is being selected:

adsl_spec <- select_dataset(metacore, "ADSL", quiet = TRUE)
✔ ADSL dataset successfully selected
ℹ Dataset metadata specification subsetted with suppressed warnings

Printing the subsetted object now provides more detailed information:

adsl_spec
── Dataset specification object for ADSL (Subject-Level Analysis Dataset) ──────
The dataset contains 51 variables
Dataset key: USUBJID
The structure of the specification object is:
→ codelist: character [16 x 4] code_id, name, type, codes
→ derivations: character [50 x 2] derivation_id derivation
→ ds_spec: character [1 x 3] dataset, structure, label
→ ds_vars: character [51 x 7] dataset, variable, key_seq, order, keep, core,
  supp_flag
→ supp: character [0 x 4] dataset, variable, idvar, qeval
→ value_spec: character [51 x 8] dataset, variable, code_id, derivation_id,
  type, origin, where, sig_dig
→ var_spec: character [51 x 6] variable, type, length, label, format, common
To inspect the specification object use `View()` in the console.

Functions that take a Metacore object as input will emit a helpful message if a subsetted object is not supplied.

ds_list <- list(DM = read_xpt(metatools_example("dm.xpt")))
create_var_from_codelist(data.frame(), metacore)
Error in `verify_DatasetMeta()`:
! The object supplied to the argument `metacore` is not a subsetted
  Metacore object. Use `metacore::select_dataset()` to subset metadata for the
  required dataset.
< section id="related-soft-deprecation-of-dataset_name-in-metatools" class="level4">

Related: soft deprecation of dataset_name in metatools

Additionally, the argument dataset_name has been soft-deprecated across all functions in < bslib-tooltip placement="auto" bsoptions="[]" data-require-bs-version="5" data-require-bs-caller="tooltip()"> < template>GitHub – pharmaverse/metatools {metatools} . While the argument is still available and will not break existing code, using it will now issue a warning. This change encourages users to adopt the preferred workflow, creating a subsetted Metacore object, and improves performance by avoiding repeated subsetting operations each time these functions are called.

The full list of affected functions is included below. The dataset_name argument will remain available for at least one year from the release date of 0.2.0 before being fully removed.

build_from_derived, check_variables, check_unique_keys, make_supp_qual, drop_unspec_vars, add_variables, order_cols, sort_by_key.

< section id="create_var_from_codelist" class="level2">

create_var_from_codelist()

< bslib-tooltip placement="auto" bsoptions="[]" data-require-bs-version="5" data-require-bs-caller="tooltip()"> < template>Create Variable from Codelist — create_var_from_codelist • metatools metatools::create_var_from_codelist() now optionally allows the user to specify a codelist from which the new column should be generated. This is useful in situations like the one below where the user is trying to derive PARAM from PARAMCD but the codelist for the out_var (PARAM) does not contain the values of PARAMCD.

ID Order Code Decode
PARAM 1 Alanine Aminotransferase Alanine Aminotransferase
PARAM 2 Bilirubin Bilirubin
PARAM 3 Creatine Creatine

Example of default usage not providing the correct result:

adlbc_spec <- suppressMessages(select_dataset(metacore, "ADLBC", quiet = TRUE))
data <- tibble(PARAMCD = c("ALB", "ALP", "ALT"))
create_var_from_codelist(data, adlbc_spec, input_var = PARAMCD, out_var = PARAM, strict = FALSE)
# A tibble: 3 × 2
  PARAMCD PARAM
  <chr>   <chr>
1 ALB     <NA> 
2 ALP     <NA> 
3 ALT     <NA> 

By default, < bslib-tooltip placement="auto" bsoptions="[]" data-require-bs-version="5" data-require-bs-caller="tooltip()"> < template>Create Variable from Codelist — create_var_from_codelist • metatools metatools::create_var_from_codelist() takes the codelist of the out_var as input. The user can now overwrite this default with a specific codelist (in this case PARAMCD below) to achieve the desired result.

ID Order Code Decode
PARAMCD 1 ALT Alanine Aminotransferase
PARAMCD 2 BILI Bilirubin
PARAMCD 3 CREAT Creatine
create_var_from_codelist(data, adlbc_spec, input_var = PARAMCD, out_var = PARAM, codelist = get_control_term(adlbc_spec, PARAMCD), decode_to_code = FALSE)
# A tibble: 3 × 2
  PARAMCD PARAM                         
  <chr>   <chr>                         
1 ALB     Albumin (g/L)                 
2 ALP     Alkaline Phosphatase (U/L)    
3 ALT     Alanine Aminotransferase (U/L)

This function also provides a new option strict, which when set to TRUE (default) will issue a warning indicating any values in your input column that do not appear in the codelist.

data <- tibble(PARAMCD = c("ALB", "ALP", "ALT", "DUMMY1", "DUMMY2"))
x <- create_var_from_codelist(data, adlbc_spec, input_var = PARAMCD, out_var = PARAM, codelist = get_control_term(adlbc_spec, PARAMCD), decode_to_code = FALSE, strict = TRUE)
Warning: In `create_var_from_codelist()`: The following values present in the input
dataset are not present in the codelist: DUMMY1 and DUMMY2
< section id="create_cat_var" class="level2">

create_cat_var()

< bslib-tooltip placement="auto" bsoptions="[]" data-require-bs-version="5" data-require-bs-caller="tooltip()"> < template>Create Categorical Variable from Codelist — create_cat_var • metatools metatools::create_cat_var() has been updated so that users can now specify to create a new variable from either the code or decode column of the controlled terminology. Previously, a codelist set-up like the one below would be evaluated from the code column only, leaving out the “years” text from the new variable.

Example of a codelist for AGEGR2
ID Name Data Type Order Code Decode
AGEGR2 Pooled Age Group 2 text 1 <35 <35 years
AGEGR2 Pooled Age Group 2 text 2 35-49 35-49 years
AGEGR2 Pooled Age Group 2 text 3 >= 50 >= 50 years

Now, specifying the option create_from_decode = TRUE will allow you to create the variable based on the text in the decode column. If you are using this option to also create a numeric coded variable (in this case AGEGR2N), ensure your CT is set up so that the decode columns match.

dm <- read_xpt(metatools_example("dm.xpt"))
create_cat_var(dm, adsl_spec, AGE, AGEGR2, AGEGR2N, create_from_decode = TRUE) %>%
  select(USUBJID, AGE, AGEGR2, AGEGR2N) %>%
  head(5)
# A tibble: 5 × 4
  USUBJID       AGE AGEGR2      AGEGR2N
  <chr>       <dbl> <chr>         <dbl>
1 01-701-1015    63 18-64 years       1
2 01-701-1023    64 18-64 years       1
3 01-701-1028    71 65-80 years       2
4 01-701-1033    74 65-80 years       2
5 01-701-1034    77 65-80 years       2

This function now also provides a default strict = TRUE option, that issues a warning message if there are values in the reference column that do not fit into the categories in the controlled terminology. This can be disabled with strict = FALSE.

dm2 <- dm |>
  tibble::add_row(AGE = 15) |>
  tibble::add_row(AGE = 16)
x <- create_cat_var(dm2, adsl_spec, AGE, AGEGR2, create_from_decode = TRUE)
Warning: There are 2 observations in AGE that do not fit into the provided categories
for AGEGR2. Please check your controlled terminology.
< section id="summary-of-other-changes" class="level2">

Summary of Other Changes

< section id="whats-next" class="level2">

What’s next?

The next step for both packages will be working through and closing out issues from the backlog, updating the examples and vignettes, and improving the user experience via more informative messaging.

For < bslib-tooltip placement="auto" bsoptions="[]" data-require-bs-version="5" data-require-bs-caller="tooltip()"> < template>A Centralized Metadata Object Focus on Clinical Trial Data Programming Workflows • metacore {metacore} , there has been some interest in a UI to help users write custom specification readers for specs not in the standard P21 format. So this will be explored as well.

I hope to release the next update towards the end of the year, looking at an approximately 6-monthly release schedule going forward. Until then I encourage people to explore some of the new features and provide feedback on the changes through GitHub at the links below:

Thanks for reading!

< !--------------- appendices go here ----------------->
< section id="last-updated" class="level2 appendix">

Last updated

2025-08-07 01:11:34.287895

< section id="details" class="level2 appendix">

Details

Source, Session info

< section class="quarto-appendix-contents" id="quarto-reuse">

Reuse

CC BY 4.0
< section class="quarto-appendix-contents" id="quarto-citation">

Citation

BibTeX citation:
@online{hobby2025,
  author = {Hobby, Liam},
  title = {Metacore and {Metatools} 0.2.0},
  date = {2025-08-04},
  url = {https://pharmaverse.github.io/blog/posts/2025-08-04_metacore_0.2.0/metacore_0.2.0.html},
  langid = {en}
}
For attribution, please cite this work as:
Hobby, Liam. 2025. “Metacore and Metatools 0.2.0.” August 4, 2025. https://pharmaverse.github.io/blog/posts/2025-08-04_metacore_0.2.0/metacore_0.2.0.html.
To leave a comment for the author, please follow the link and comment on their blog: pharmaverse blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Exit mobile version