Metacore and Metatools 0.2.0
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
A Centralized Metadata Object Focus on Clinical Trial Data Programming Workflows • metacore {metacore} and
Hi, everyone! I’m Liam and I’m excited to announce that I have taken over as package maintainer for both
Christina remains on-hand as a mentor and I’d like to thank both her and Ben Straub for the continued support before we dive into the details of
What’s new in metacore?
The goal of version 0.2.0 was to clarify the distinction between an imported Metacore spec, containing information about multiple datasets, and a subsetted spec containing information about just a single dataset (as achieved via
We received a number of questions and issues raised where users were attempting to use a Metacore object containing metadata for multiple datasets in functions from
Now, a Metacore object which has multiple datasets or one with a single dataset have been redesigned to be programmatically distinct, with the single dataset implemented as a subclass of Metacore called “DatasetMeta”.
From the users’ perspective there is one key change. A metadata object about a single dataset will be required for users to work with
The print statements of both combined and subsetted Metacore objects have been refined to better illustrate the differences between them and provide more helpful information to the user.
library(dplyr)
Attaching package: 'dplyr'
The following objects are masked from 'package:stats': filter, lag
The following objects are masked from 'package:base': intersect, setdiff, setequal, union
library(metacore) library(metatools) library(tibble) library(haven) load(metacore_example("pilot_ADaM.rda")) metacore
── Metacore object contains metadata for 5 datasets ────────────────────────────
→ ADSL (Subject-Level Analysis Dataset)
→ ADADAS (ADAS-Cog Analysis)
→ ADLBC (Analysis Dataset Lab Blood Chemistry)
→ ADTTE (AE Time To 1st Derm. Event Analysis)
→ ADAE (Adverse Events Analysis Dataset)
ℹ To use the Metacore object with metatools package, first subset a dataset using `metacore::select_dataset()`
The
adsl_spec <- select_dataset(metacore, "ADSL", quiet = TRUE)
✔ ADSL dataset successfully selected
ℹ Dataset metadata specification subsetted with suppressed warnings
Printing the subsetted object now provides more detailed information:
adsl_spec
── Dataset specification object for ADSL (Subject-Level Analysis Dataset) ──────
The dataset contains 51 variables
Dataset key: USUBJID
The structure of the specification object is:
→ codelist: character [16 x 4] code_id, name, type, codes
→ derivations: character [50 x 2] derivation_id derivation
→ ds_spec: character [1 x 3] dataset, structure, label
→ ds_vars: character [51 x 7] dataset, variable, key_seq, order, keep, core, supp_flag
→ supp: character [0 x 4] dataset, variable, idvar, qeval
→ value_spec: character [51 x 8] dataset, variable, code_id, derivation_id, type, origin, where, sig_dig
→ var_spec: character [51 x 6] variable, type, length, label, format, common
To inspect the specification object use `View()` in the console.
Functions that take a Metacore object as input will emit a helpful message if a subsetted object is not supplied.
ds_list <- list(DM = read_xpt(metatools_example("dm.xpt"))) create_var_from_codelist(data.frame(), metacore)
Error in `verify_DatasetMeta()`: ! The object supplied to the argument `metacore` is not a subsetted Metacore object. Use `metacore::select_dataset()` to subset metadata for the required dataset.
create_var_from_codelist()
PARAM
from PARAMCD
but the codelist for the out_var
(PARAM
) does not contain the values of PARAMCD
.
ID | Order | Code | Decode |
---|---|---|---|
PARAM | 1 | Alanine Aminotransferase | Alanine Aminotransferase |
PARAM | 2 | Bilirubin | Bilirubin |
PARAM | 3 | Creatine | Creatine |
Example of default usage not providing the correct result:
adlbc_spec <- suppressMessages(select_dataset(metacore, "ADLBC", quiet = TRUE)) data <- tibble(PARAMCD = c("ALB", "ALP", "ALT")) create_var_from_codelist(data, adlbc_spec, input_var = PARAMCD, out_var = PARAM, strict = FALSE)
# A tibble: 3 × 2 PARAMCD PARAM <chr> <chr> 1 ALB <NA> 2 ALP <NA> 3 ALT <NA>
By default, out_var
as input. The user can now overwrite this default with a specific codelist (in this case PARAMCD
below) to achieve the desired result.
ID | Order | Code | Decode |
---|---|---|---|
PARAMCD | 1 | ALT | Alanine Aminotransferase |
PARAMCD | 2 | BILI | Bilirubin |
PARAMCD | 3 | CREAT | Creatine |
create_var_from_codelist(data, adlbc_spec, input_var = PARAMCD, out_var = PARAM, codelist = get_control_term(adlbc_spec, PARAMCD), decode_to_code = FALSE)
# A tibble: 3 × 2 PARAMCD PARAM <chr> <chr> 1 ALB Albumin (g/L) 2 ALP Alkaline Phosphatase (U/L) 3 ALT Alanine Aminotransferase (U/L)
This function also provides a new option strict
, which when set to TRUE
(default) will issue a warning indicating any values in your input column that do not appear in the codelist.
data <- tibble(PARAMCD = c("ALB", "ALP", "ALT", "DUMMY1", "DUMMY2")) x <- create_var_from_codelist(data, adlbc_spec, input_var = PARAMCD, out_var = PARAM, codelist = get_control_term(adlbc_spec, PARAMCD), decode_to_code = FALSE, strict = TRUE)
Warning: In `create_var_from_codelist()`: The following values present in the input dataset are not present in the codelist: DUMMY1 and DUMMY2
create_cat_var()
code
or decode
column of the controlled terminology. Previously, a codelist set-up like the one below would be evaluated from the code
column only, leaving out the “years” text from the new variable.
ID | Name | Data Type | Order | Code | Decode |
---|---|---|---|---|---|
AGEGR2 | Pooled Age Group 2 | text | 1 | <35 | <35 years |
AGEGR2 | Pooled Age Group 2 | text | 2 | 35-49 | 35-49 years |
AGEGR2 | Pooled Age Group 2 | text | 3 | >= 50 | >= 50 years |
Now, specifying the option create_from_decode = TRUE
will allow you to create the variable based on the text in the decode
column. If you are using this option to also create a numeric coded variable (in this case AGEGR2N
), ensure your CT is set up so that the decode
columns match.
dm <- read_xpt(metatools_example("dm.xpt")) create_cat_var(dm, adsl_spec, AGE, AGEGR2, AGEGR2N, create_from_decode = TRUE) %>% select(USUBJID, AGE, AGEGR2, AGEGR2N) %>% head(5)
# A tibble: 5 × 4 USUBJID AGE AGEGR2 AGEGR2N <chr> <dbl> <chr> <dbl> 1 01-701-1015 63 18-64 years 1 2 01-701-1023 64 18-64 years 1 3 01-701-1028 71 65-80 years 2 4 01-701-1033 74 65-80 years 2 5 01-701-1034 77 65-80 years 2
This function now also provides a default strict = TRUE
option, that issues a warning message if there are values in the reference column that do not fit into the categories in the controlled terminology. This can be disabled with strict = FALSE
.
dm2 <- dm |> tibble::add_row(AGE = 15) |> tibble::add_row(AGE = 16) x <- create_cat_var(dm2, adsl_spec, AGE, AGEGR2, create_from_decode = TRUE)
Warning: There are 2 observations in AGE that do not fit into the provided categories for AGEGR2. Please check your controlled terminology.
Summary of Other Changes
Fixed a bug where the presence of variables with VLM in the
value_spec
table would prevent variables of the same name in different datasets being populated in thevalue_spec
table.Build a dataset from derived — build_from_derived • metatools metatools::build_from_derived() adds new options for thekeep
parameter that allow users to derive eitherall
or onlyprerequisite
columns from source datasets. Thanks to Matt Bearham for this amendment!Combine the Domain and Supplemental Qualifier — combine_supp • metatools metatools::combine_supp() now adds the label found inQLABEL
to theQNAM
columns that are derived from supplementary datasets. Thanks to Bill Denney for this amendment!Check Variable Names — check_variables • metatools metatools::check_variables() now provides astrict
option that will issue a warning rather than throw an error whenstrict = FALSE
.Helpers for Developing Command Line Interfaces • cli {cli} output is now used across both packages and messaging for various functions has been improved.
What’s next?
The next step for both packages will be working through and closing out issues from the backlog, updating the examples and vignettes, and improving the user experience via more informative messaging.
For
I hope to release the next update towards the end of the year, looking at an approximately 6-monthly release schedule going forward. Until then I encourage people to explore some of the new features and provide feedback on the changes through GitHub at the links below:
Thanks for reading!
Last updated
2025-08-07 01:11:34.287895
Details
Reuse
Citation
@online{hobby2025, author = {Hobby, Liam}, title = {Metacore and {Metatools} 0.2.0}, date = {2025-08-04}, url = {https://pharmaverse.github.io/blog/posts/2025-08-04_metacore_0.2.0/metacore_0.2.0.html}, langid = {en} }
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.