Metacore and Metatools 0.2.0

[This article was first published on pharmaverse blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

{metacore} and {metatools} have a new package maintainer

Hi, everyone! I’m Liam and I’m excited to announce that I have taken over as package maintainer for both {metacore} and {metatools} from Christina Fillmore. I work at GSK as a clinical programmer and I am coming to the end of my second year in the industry. This is my first experience working within the open-source world, but I am a regular user of pharmaverse packages and am keen to get more involved with the community.

Christina remains on-hand as a mentor and I’d like to thank both her and Ben Straub for the continued support before we dive into the details of {metacore} / {metatools} 0.2.0.

What’s new in metacore?

The goal of version 0.2.0 was to clarify the distinction between an imported Metacore spec, containing information about multiple datasets, and a subsetted spec containing information about just a single dataset (as achieved via metacore::select_dataset() ).

We received a number of questions and issues raised where users were attempting to use a Metacore object containing metadata for multiple datasets in functions from {metatools} that were designed to take a single, subsetted specification. When developing datasets, the typical workflow is to be working on a single dataset at a time – so subsetting the Metacore object is the logical thing to do. The issue was that the approach to functions in {metatools} was inconsistent, with some functions permitting multiple specification metadata and others not.

Now, a Metacore object which has multiple datasets or one with a single dataset have been redesigned to be programmatically distinct, with the single dataset implemented as a subclass of Metacore called “DatasetMeta”.

From the users’ perspective there is one key change. A metadata object about a single dataset will be required for users to work with {metatools} functions, which have had their API harmonised to accept only subsetted Metacore objects (via metacore::select_dataset() ).

The print statements of both combined and subsetted Metacore objects have been refined to better illustrate the differences between them and provide more helpful information to the user.

library(dplyr)
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
library(metacore)
library(metatools)
library(tibble)
library(haven)

load(metacore_example("pilot_ADaM.rda"))
metacore
── Metacore object contains metadata for 5 datasets ────────────────────────────
→ ADSL (Subject-Level Analysis Dataset)
→ ADADAS (ADAS-Cog Analysis)
→ ADLBC (Analysis Dataset Lab Blood Chemistry)
→ ADTTE (AE Time To 1st Derm. Event Analysis)
→ ADAE (Adverse Events Analysis Dataset)
ℹ To use the Metacore object with metatools package, first subset a dataset using `metacore::select_dataset()`

The metacore::select_dataset() function is now explicit about what is being selected:

adsl_spec <- select_dataset(metacore, "ADSL", quiet = TRUE)
✔ ADSL dataset successfully selected
ℹ Dataset metadata specification subsetted with suppressed warnings

Printing the subsetted object now provides more detailed information:

adsl_spec
── Dataset specification object for ADSL (Subject-Level Analysis Dataset) ──────
The dataset contains 51 variables
Dataset key: USUBJID
The structure of the specification object is:
→ codelist: character [16 x 4] code_id, name, type, codes
→ derivations: character [50 x 2] derivation_id derivation
→ ds_spec: character [1 x 3] dataset, structure, label
→ ds_vars: character [51 x 7] dataset, variable, key_seq, order, keep, core,
  supp_flag
→ supp: character [0 x 4] dataset, variable, idvar, qeval
→ value_spec: character [51 x 8] dataset, variable, code_id, derivation_id,
  type, origin, where, sig_dig
→ var_spec: character [51 x 6] variable, type, length, label, format, common
To inspect the specification object use `View()` in the console.

Functions that take a Metacore object as input will emit a helpful message if a subsetted object is not supplied.

ds_list <- list(DM = read_xpt(metatools_example("dm.xpt")))
create_var_from_codelist(data.frame(), metacore)
Error in `verify_DatasetMeta()`:
! The object supplied to the argument `metacore` is not a subsetted
  Metacore object. Use `metacore::select_dataset()` to subset metadata for the
  required dataset.

create_var_from_codelist()

metatools::create_var_from_codelist() now optionally allows the user to specify a codelist from which the new column should be generated. This is useful in situations like the one below where the user is trying to derive PARAM from PARAMCD but the codelist for the out_var (PARAM) does not contain the values of PARAMCD.

ID Order Code Decode
PARAM 1 Alanine Aminotransferase Alanine Aminotransferase
PARAM 2 Bilirubin Bilirubin
PARAM 3 Creatine Creatine

Example of default usage not providing the correct result:

adlbc_spec <- suppressMessages(select_dataset(metacore, "ADLBC", quiet = TRUE))
data <- tibble(PARAMCD = c("ALB", "ALP", "ALT"))
create_var_from_codelist(data, adlbc_spec, input_var = PARAMCD, out_var = PARAM, strict = FALSE)
# A tibble: 3 × 2
  PARAMCD PARAM
  <chr>   <chr>
1 ALB     <NA> 
2 ALP     <NA> 
3 ALT     <NA> 

By default, metatools::create_var_from_codelist() takes the codelist of the out_var as input. The user can now overwrite this default with a specific codelist (in this case PARAMCD below) to achieve the desired result.

ID Order Code Decode
PARAMCD 1 ALT Alanine Aminotransferase
PARAMCD 2 BILI Bilirubin
PARAMCD 3 CREAT Creatine
create_var_from_codelist(data, adlbc_spec, input_var = PARAMCD, out_var = PARAM, codelist = get_control_term(adlbc_spec, PARAMCD), decode_to_code = FALSE)
# A tibble: 3 × 2
  PARAMCD PARAM                         
  <chr>   <chr>                         
1 ALB     Albumin (g/L)                 
2 ALP     Alkaline Phosphatase (U/L)    
3 ALT     Alanine Aminotransferase (U/L)

This function also provides a new option strict, which when set to TRUE (default) will issue a warning indicating any values in your input column that do not appear in the codelist.

data <- tibble(PARAMCD = c("ALB", "ALP", "ALT", "DUMMY1", "DUMMY2"))
x <- create_var_from_codelist(data, adlbc_spec, input_var = PARAMCD, out_var = PARAM, codelist = get_control_term(adlbc_spec, PARAMCD), decode_to_code = FALSE, strict = TRUE)
Warning: In `create_var_from_codelist()`: The following values present in the input
dataset are not present in the codelist: DUMMY1 and DUMMY2

create_cat_var()

metatools::create_cat_var() has been updated so that users can now specify to create a new variable from either the code or decode column of the controlled terminology. Previously, a codelist set-up like the one below would be evaluated from the code column only, leaving out the “years” text from the new variable.

Example of a codelist for AGEGR2
ID Name Data Type Order Code Decode
AGEGR2 Pooled Age Group 2 text 1 <35 <35 years
AGEGR2 Pooled Age Group 2 text 2 35-49 35-49 years
AGEGR2 Pooled Age Group 2 text 3 >= 50 >= 50 years

Now, specifying the option create_from_decode = TRUE will allow you to create the variable based on the text in the decode column. If you are using this option to also create a numeric coded variable (in this case AGEGR2N), ensure your CT is set up so that the decode columns match.

dm <- read_xpt(metatools_example("dm.xpt"))
create_cat_var(dm, adsl_spec, AGE, AGEGR2, AGEGR2N, create_from_decode = TRUE) %>%
  select(USUBJID, AGE, AGEGR2, AGEGR2N) %>%
  head(5)
# A tibble: 5 × 4
  USUBJID       AGE AGEGR2      AGEGR2N
  <chr>       <dbl> <chr>         <dbl>
1 01-701-1015    63 18-64 years       1
2 01-701-1023    64 18-64 years       1
3 01-701-1028    71 65-80 years       2
4 01-701-1033    74 65-80 years       2
5 01-701-1034    77 65-80 years       2

This function now also provides a default strict = TRUE option, that issues a warning message if there are values in the reference column that do not fit into the categories in the controlled terminology. This can be disabled with strict = FALSE.

dm2 <- dm |>
  tibble::add_row(AGE = 15) |>
  tibble::add_row(AGE = 16)
x <- create_cat_var(dm2, adsl_spec, AGE, AGEGR2, create_from_decode = TRUE)
Warning: There are 2 observations in AGE that do not fit into the provided categories
for AGEGR2. Please check your controlled terminology.

Summary of Other Changes

  • Fixed a bug where the presence of variables with VLM in the value_spec table would prevent variables of the same name in different datasets being populated in the value_spec table.

  • metatools::build_from_derived() adds new options for the keep parameter that allow users to derive either all or only prerequisite columns from source datasets. Thanks to Matt Bearham for this amendment!

  • metatools::combine_supp() now adds the label found in QLABEL to the QNAM columns that are derived from supplementary datasets. Thanks to Bill Denney for this amendment!

  • metatools::check_variables() now provides a strict option that will issue a warning rather than throw an error when strict = FALSE.

  • {cli} output is now used across both packages and messaging for various functions has been improved.

What’s next?

The next step for both packages will be working through and closing out issues from the backlog, updating the examples and vignettes, and improving the user experience via more informative messaging.

For {metacore} , there has been some interest in a UI to help users write custom specification readers for specs not in the standard P21 format. So this will be explored as well.

I hope to release the next update towards the end of the year, looking at an approximately 6-monthly release schedule going forward. Until then I encourage people to explore some of the new features and provide feedback on the changes through GitHub at the links below:

Thanks for reading!

Last updated

2025-08-07 01:11:34.287895

Details

Reuse

Citation

BibTeX citation:
@online{hobby2025,
  author = {Hobby, Liam},
  title = {Metacore and {Metatools} 0.2.0},
  date = {2025-08-04},
  url = {https://pharmaverse.github.io/blog/posts/2025-08-04_metacore_0.2.0/metacore_0.2.0.html},
  langid = {en}
}
For attribution, please cite this work as:
Hobby, Liam. 2025. “Metacore and Metatools 0.2.0.” August 4, 2025. https://pharmaverse.github.io/blog/posts/2025-08-04_metacore_0.2.0/metacore_0.2.0.html.
To leave a comment for the author, please follow the link and comment on their blog: pharmaverse blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)