FAIR standards for the creation of research materials, with examples

[This article was first published on R on Pablo Bernabeu, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Notwithstanding the need for speed in most scientific projects, what should be the minimum acceptable standards in the creation of research materials such as stimuli, custom software for data collection (e.g., experiment in jsPsych, OpenSesame or psychoPy), or scripts for statistical analysis?

The answer to this question is contingent upon the field of research, the purpose and the duration of the project, and many other contextual factors. So, to narrow down the scope and come at a general answer, let’s suppose we asked a researcher in the cognitive sciences (e.g., a linguist, a psychologist or a neuroscientist) who values open science. Perhaps, such a researchers would be satisfied with a method for the creation of materials that allows the creators of the materials, as well as their collaborators and any other stakeholders (e.g., any fellow scientists working in the same field), to explore, understand, reproduce, modify, and reuse the materials following their completion and thereafter. Let’s review some of the implements that can help fulfil these standards.

FAIRness

The FAIR Guiding Principles for scientific data management and stewardship exhaustively describe a protocol for making materials Findable, Accessible, Interoperable and Reusable. These terms cover the five allowances listed above, along with other important aspects.

Let’s look at some instantiations of the FAIR principles.

Sharing the materials

A condition sine qua non is to share the materials publicly online, as far as possible. Repositories on servers such as OSF or GitHub are often sufficient to this end. Unfortunately, most studies in the cognitive sciences still do not share the complete materials.

One of the reasons why sharing is so important is to prevent wrong assumptions by the audience that will consider the research. That is, when the materials of a study are not publicly shared online, the readers of the papers are left with two options: to assume that there were no errors or to assume that were some errors of an uncertain degree. The method followed in the creation of the materials should free the readers of this tribulation, by allowing them to consult the materials and their preparation in full, or as completely as possible.

Reproducibility

It is convenient to allow others, and our future selves, to reproduce the materials throughout their preparation and at any time thereafter. For this purpose, R can be used to register in scripts as many as possible of the steps followed throughout the preparation of the materials. Far from being only a software for data analysis, R allows the preparation of texts, images, audios, etc. Humans err, by definition. That can be counted on. Conveniently, registering the steps followed during weeks or months of preparation allows us to offload part of the documentation efforts. It’s a way of video-recording, as it were, all the additions, subtractions, replacements, transformations and calculations performed with the raw materials.

Generous documentation

Under the curse of knowledge, the creators of research materials may believe that their materials are self-explanatory. Often they are more obscure than they think. To allow any other stakeholders, including their future selves, to exercise the five allowances listed above—i.e., explore, understand, reproduce, modify, and reuse the materials—, the preparation process and the end materials should be documented with enough detail. This can be done using README.txt files throughout the project. Using the .txt format/extension is recommended because other formats, such as Microsoft Word, may not be (fully) available in some computers. To exemplify the format and the content of readme files, below is an excerpt from a longitudinal study on which I’ve been working.

Comments in code scripts

It is helpful for our future selves, for our collaborators, and for any other stakeholders associated with a project—which includes any fellow researchers worldwide—to include comments in code scripts. These comments should introduce the purpose of the script at the top, and the purpose of various components of the code. Some excerpts are shown below as examples.

Open-source software

Where possible, open-source software should be used. Open-source software is free, and hence more accessible. Open-source software can be classified in various dimensions, such as the size of the user base. The more users, the greater the support, because the core developers have more resources, and the users will often help each other in public forums such as Stack Exchange. For instance, a programming language such as R boasts millions of users worldwide who count on support in public forums and in R-specific forums such as the Posit Community.

Other software are not as large. For instance, open-source software for psychological research (e.g., OpenSesame, psychoPy) are far smaller than R in terms of community. Yet, these software too can count on substantial support. For the more basic uses, most of the way has already been paved, and the existing documentation suffices. For more advanced uses, the smaller size of the community can become more obvious, as one needs to spend more time researching on solutions for their needs.

Regardless of the size of the community, all else being equal, open-source software is the right choice to ensure access to one’s work for all (potential) stakeholders in the future. The other option, proprietory software, entails dependence on the services of a private company.

Tidiness and parsimony in computer code

Code scripts should be as tidy and parsimonious as possible. For instance, to prevent overly long scripts that would impair the comprehension of the materials, it is useful to break down large projects into nested scripts, and source (i.e., run) the smaller scripts in the larger scripts.

# Compose all stimuli for Sessions 2, 3, 4 and 6

# Create participant-specific parameters
source('stimulus_preparation/participant_parameters.R')

# Frame base images
source('stimulus_preparation/base_images.R')

# Session 2
source('stimulus_preparation/Session 2/Session2_compile_all_stimuli.R')

# Session 3
source('stimulus_preparation/Session 3/Session3_compile_all_stimuli.R')

# Session 4
source('stimulus_preparation/Session 4/Session4_compile_all_stimuli.R')

# Session 6
source('stimulus_preparation/Session 6/Session6_compile_all_stimuli.R')

Tidiness and parsimony in project directories

A directory tree is useful to display all the folders in a project. The tree can be produced in the RStudio ‘Terminal’ console using the following one-line command.

find . -type d | sed -e "s/[^-][^\/]*\//  |/g" -e "s/|\([^ ]\)/| - \1/"

The output will look like the following (excerpt from https://osf.io/gt5uf/wiki).

.
  | - bayesian_priors
  |  | - plots
  | - semanticpriming
  |  | - analysis_with_visualsimilarity
  |  |  | - model_diagnostics
  |  |  |  | - results
  |  |  |  | - plots
  |  |  | - results
  |  |  | - plots
  |  |  | - correlations
  |  |  |  | - plots
  |  | - frequentist_bayesian_plots
  |  |  | - plots
  |  | - frequentist_analysis
  |  |  | - model_diagnostics
  |  |  |  | - results
  |  |  |  | - plots
  |  |  | - lexical_covariates_selection
  |  |  |  | - results
  |  |  |  | - plots
  |  |  | - results
  |  |  | - plots
To leave a comment for the author, please follow the link and comment on their blog: R on Pablo Bernabeu.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)