FAIR standards for the creation of research materials, with examples

Posted on November 1, 2023 by R on Pablo Bernabeu in R bloggers | 0 Comments

[This article was first published on R on Pablo Bernabeu, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Notwithstanding the need for speed in most scientific projects, what should be the minimum acceptable standards in the creation of research materials such as stimuli, custom software for data collection (e.g., experiment in jsPsych, OpenSesame or psychoPy), or scripts for statistical analysis?

The answer to this question is contingent upon the field of research, the purpose and the duration of the project, and many other contextual factors. So, to narrow down the scope and come at a general answer, let’s suppose we asked a researcher in the cognitive sciences (e.g., a linguist, a psychologist or a neuroscientist) who values open science. Perhaps, such a researchers would be satisfied with a method for the creation of materials that allows the creators of the materials, as well as their collaborators and any other stakeholders (e.g., any fellow scientists working in the same field), to explore, understand, reproduce, modify, and reuse the materials following their completion and thereafter. Let’s review some of the implements that can help fulfil these standards.

FAIRness

The FAIR Guiding Principles for scientific data management and stewardship exhaustively describe a protocol for making materials Findable, Accessible, Interoperable and Reusable. These terms cover the five allowances listed above, along with other important aspects.

Let’s look at some instantiations of the FAIR principles.

Reproducibility

It is convenient to allow others, and our future selves, to reproduce the materials throughout their preparation and at any time thereafter. For this purpose, R can be used to register in scripts as many as possible of the steps followed throughout the preparation of the materials. Far from being only a software for data analysis, R allows the preparation of texts, images, audios, etc. Humans err, by definition. That can be counted on. Conveniently, registering the steps followed during weeks or months of preparation allows us to offload part of the documentation efforts. It’s a way of video-recording, as it were, all the additions, subtractions, replacements, transformations and calculations performed with the raw materials.

Generous documentation

Under the curse of knowledge, the creators of research materials may believe that their materials are self-explanatory. Often they are more obscure than they think. To allow any other stakeholders, including their future selves, to exercise the five allowances listed above—i.e., explore, understand, reproduce, modify, and reuse the materials—, the preparation process and the end materials should be documented with enough detail. This can be done using README.txt files throughout the project. Using the .txt format/extension is recommended because other formats, such as Microsoft Word, may not be (fully) available in some computers. To exemplify the format and the content of readme files, below is an excerpt from a longitudinal study on which I’ve been working.

— Post-training test —

In Sessions 2, 3, 4 and 6, if the test is failed in the first attempt, the training and the test are 
repeated (following González Alonso et al., 2020). In such cases, the result is shown at the end 
of the second attempt. The session advances if the accuracy achieved in the second attempt exceeds 
80%, whereas the session stops if the accuracy is lower. In the latter situation, an ‘End of session’ 
message is presented, flanked by two orange circles, and followed by an acknowledgement for the 
participant. Once the participant has read this screen, the experimenter quits the session by 
pressing ‘ESC’ and then ‘Q’.


== Stimuli ==

The stimulus lists are described in the R functions that were used to create the stimuli, as well as
in the ‘list’ column in the stimulus files.


== Participant-specific parameters for lab-based sessions ==

Each participant was assigned certain parameters in advance, including the mini-language, the order 
of the resting-state parts, and the stimulus lists. The code that was used to create this assignment 
is available in the ‘stimulus_preparation’ folder. 

Due to the pre-assignment of the parameters, there is a fixed set of participant IDs that can be 
used in OpenSesame. These identification numbers range between 1 and 144. If an ID outside of this 
range is used, the OpenSesame session does not run.


== General procedure for lab-based sessions ==

At the beginning of Sessions 2, 3, 4 and 6, the experimenter starts OpenSesame by opening the 
program directly (not by opening the session-specific file), and then opens the appropriate session 
within OpenSesame. This procedure helps prevent the opening of a standalone Python window, the 
closing of which would result in the closing of OpenSesame. Next, the experimenter opens BrainVision 
Recorder.

Next, the experimenter fits the participant with the EEG cap, which they will wear throughout the 
session. To prevent them from being pulled down, please attach the splitter box neatly to the 
towel on the participant’s back. 

Next, the experimenter returns to OpenSesame and runs the session in full screen by clicking on 
the full green triangle at the top left. Next, the experimenter selects a folder to store the 
logfile. It is important to select the folder corresponding to each session to avoid overwriting 
existing logfiles. Any prompts to overwrite a logfile must always be refused.

In the first screen, the experimenter can disable some of the tasks. This option can be used if a 
session has ended abruptly, in which case the session can be resumed from a near checkpoint. In 
such a case, the experimenter must first note this incident in their logbook, and rename the log 
file that was produced on the first run, by appending ‘_first_run’ to the name. This prevents 
overwriting the file on the second run. Next, they must open a new session, enter the same 
participant ID, and select the appropriate part from which to begin. This part must be the part 
immediately following the last part that was completed in full. For instance, if a session ended
abruptly during the experiment, the beginning selected on the second run would be the experiment. 
Once the session has finished completely, the first log file and the second log file must be 
safely merged into a single file, keeping only the fully completed tasks.

In the first instructional screen, participants are asked to refrain from asking any questions 
unless it is necessary, so that all participants can receive the same instructions.

At the beginning of the Resting-state part (present in Sessions 2 and 4) and at the beginning of 
the Experiment part, instructions are presented on the screen that ask participants to stay as 
still as possible during the following task. The screen contains an orange-coloured square with 
the letters ‘i.s.r’, that remind the experimenter to check the impedance and the signal, and 
finally to begin recording the EEG signal. If the impedance of any electrodes is poor, the 
experimenter may enter the booth to lower the impedance of the electrodes affected. Otherwise, 
after validating the signal and the impedance, the experimenter can begin the recording in 
BrainVision, and press the letter ‘C’ twice in the stimulus computer. At that point, a green 
circle will appear, along with instructions for the participant. 

Similarly, at the end of the Resting-state part and at the end of the Experiment part, a screen
with a crossed-out R appears to remind the experimenter to stop recording the EEG. 

Notice that, at some important stages during the sessions, the letter ‘C’ must be pressed twice 
by the experimenter to let the session continue. This protocol provides the experimenter with 
control when necessary. These moments are signalled by a ‘wait a moment’ notice for the 
participant, and by two orange-coloured stripes on the screen. The experimenter should be aware 
of the use of the letter ‘C’ at these points, as the requirement is not signalled on the screen 
to prevent participants from pressing the letter themselves. 

During the experiment, it is important to monitor the EEG signal. If it ever becomes very noisy, 
the experiment must be paused by pressing the ESC key, and the problem must be resolved. If the 
noise is due to movement by the participant, they should be asked again to please stay as still 
possible. If the noise is due to an increase in the impedance of some electrodes, the impedance 
of those electrodes should be revised.

The Experiment part in each session contains a break every 40 trials. During these breaks, the 
number of the current trial appears in grey on the bottom right corner of the screen.


== Definition of items in OpenSesame (only for programming purposes, not for in-session use) ==

  — Each major part of the session is contained in a sequence item that is named in capital 
     letters (e.g., ‘PRETRAINING’, ‘TRAINING’, ‘TEST’, ‘EXPERIMENT’).

  — ‘continue_space’: allows proceeding to the following screen after pressing the space bar, 
     which should be done by the participant. In most cases, two presses are required, as 
     detailed on the screen.

  — ‘continue_c’: allows proceeding to the following screen after pressing the letter ‘C’, 
     which should be done by the experimenter. In most cases, two presses are required, as 
     detailed on the screen.


== Variables in the OpenSesame log files ==

In the log files produced by OpenSesame, each part of the session (e.g., Test, Experiment) is 
identified in the variable ‘session_part’. The names of the response variables are ‘response’,
‘response_time’ and ‘correct’. Item-specific response variables follow the formats of 
‘response_[item_name]’, ‘response_time_[item_name]’ and ‘correct_[item_name]’ 
(see https://osdoc.cogsci.nl/3.3/manual/variables/#response-variables).

The output is verbose and requires preprocessing of the data. For instance, the last response 
in each loop may appear twice in the output, due to the processing of the response. These 
duplicates can–and must–be cleaned up by discarding the rows that have the same trial number
as the preceding row.


== EEG triggers ==

Triggers are sent to the EEG recorder throughout the experiment. The system for sending 
triggers is set up in OpenSesame script within the inline script ‘EEG_trigger_setup’.

The key to the triggers is provided below.

  0: reset trigger port in BrainVision Recorder. This trigger is integrated in the 
     trigger-sending function.

  — Resting-state EEG part —

    10: beginning of eyes-open resting-state EEG

    11: end of eyes-open resting-state EEG

    12: beginning of eyes-closed resting-state EEG

    13: end of eyes-closed resting-state EEG

  — Experiment part —

    5: fixation mark

    — ID of each target sentence (only applicable to target trials) —

        110–253: triggers ranging between 110 and 253, time-locked to the onset of the 
          word of interest in each trial.

Comments in code scripts

It is helpful for our future selves, for our collaborators, and for any other stakeholders associated with a project—which includes any fellow researchers worldwide—to include comments in code scripts. These comments should introduce the purpose of the script at the top, and the purpose of various components of the code. Some excerpts are shown below as examples.

Open-source software

Where possible, open-source software should be used. Open-source software is free, and hence more accessible. Open-source software can be classified in various dimensions, such as the size of the user base. The more users, the greater the support, because the core developers have more resources, and the users will often help each other in public forums such as Stack Exchange. For instance, a programming language such as R boasts millions of users worldwide who count on support in public forums and in R-specific forums such as the Posit Community.

Other software are not as large. For instance, open-source software for psychological research (e.g., OpenSesame, psychoPy) are far smaller than R in terms of community. Yet, these software too can count on substantial support. For the more basic uses, most of the way has already been paved, and the existing documentation suffices. For more advanced uses, the smaller size of the community can become more obvious, as one needs to spend more time researching on solutions for their needs.

Regardless of the size of the community, all else being equal, open-source software is the right choice to ensure access to one’s work for all (potential) stakeholders in the future. The other option, proprietory software, entails dependence on the services of a private company.

Tidiness and parsimony in computer code

Code scripts should be as tidy and parsimonious as possible. For instance, to prevent overly long scripts that would impair the comprehension of the materials, it is useful to break down large projects into nested scripts, and source (i.e., run) the smaller scripts in the larger scripts.

# Compose all stimuli for Sessions 2, 3, 4 and 6

# Create participant-specific parameters
source('stimulus_preparation/participant_parameters.R')

# Frame base images
source('stimulus_preparation/base_images.R')

# Session 2
source('stimulus_preparation/Session 2/Session2_compile_all_stimuli.R')

# Session 3
source('stimulus_preparation/Session 3/Session3_compile_all_stimuli.R')

# Session 4
source('stimulus_preparation/Session 4/Session4_compile_all_stimuli.R')

# Session 6
source('stimulus_preparation/Session 6/Session6_compile_all_stimuli.R')

Tidiness and parsimony in project directories

A directory tree is useful to display all the folders in a project. The tree can be produced in the RStudio ‘Terminal’ console using the following one-line command.

find . -type d | sed -e "s/[^-][^\/]*\//  |/g" -e "s/|\([^ ]\)/| - \1/"

The output will look like the following (excerpt from https://osf.io/gt5uf/wiki).

.
  | - bayesian_priors
  |  | - plots
  | - semanticpriming
  |  | - analysis_with_visualsimilarity
  |  |  | - model_diagnostics
  |  |  |  | - results
  |  |  |  | - plots
  |  |  | - results
  |  |  | - plots
  |  |  | - correlations
  |  |  |  | - plots
  |  | - frequentist_bayesian_plots
  |  |  | - plots
  |  | - frequentist_analysis
  |  |  | - model_diagnostics
  |  |  |  | - results
  |  |  |  | - plots
  |  |  | - lexical_covariates_selection
  |  |  |  | - results
  |  |  |  | - plots
  |  |  | - results
  |  |  | - plots

To leave a comment for the author, please follow the link and comment on their blog: R on Pablo Bernabeu.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

FAIR standards for the creation of research materials, with examples

FAIRness

Reproducibility

Generous documentation

Comments in code scripts

Open-source software

Tidiness and parsimony in computer code

Tidiness and parsimony in project directories

Related

FAIRness

Sharing the materials

Reproducibility

Generous documentation

Comments in code scripts

Open-source software

Tidiness and parsimony in computer code

Tidiness and parsimony in project directories

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)