Notes on Citing R and R Packages

[This article was first published on Higher Order Functions, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Our group has started using a new knowledge base system, so I have been writing up and revisiting some of my documentation. Here I am going to share a guide I wrote about citing R packages in academic writing.

Which software to cite

Let’s make a distinction here between reporting (or summarizing) an analysis and reproducing (or carrying out) an analysis.

Our main manuscript document is for reporting. We want to report which tools and which versions of those tools we used to get our statistical results. We don’t need to include every computational detail. We will save that level of detail for a supplemental document that shows the exact modeling code and sessioninfo::session_info() for reproducing our results. Moreover, journals will sometimes limit the number of references in a manuscript and a full R analysis might draw on 15 packages, so we in general cannot cite everything that helped us get our results. So, we can think more generally about citation priorities.

For an analysis carried out in R, we need to cite and version:

  • R (the programming language / analysis environment).
  • Third party packages that carried out the analyses.
    • For example, nlme, lme4, ordinal, rms, brms.
  • If a package calls on another language or analysis tool, cite that tool as well.
    • For example, brms and rstanarm fit models using the Stan programming language, so we need to cite and version Stan as well.
  • Packages that performed additional computation on analysis results.
    • For example, emmeans to get marginal means from a fitted model.
  • Packages that visualized analysis results automatically. For example, see or interactions.

The following items would have the lowest priority for citations:

  • RStudio: It’s just an interface to the language. (Ideally, an analysis could be run without touching RStudio.)
  • The built-in stats package.
  • knitr/quarto/rmarkdown: These performed R computations for us and stored the results in a document.
  • Siloed off parts of a main package.
    • For example, the gamlss package fits GAMLSS models but the distributions for model families are stored in the package gamlss.dist. gamlss needs gamlss.dist to work, but gamlss is the main important thing to cite.
  • Data storage formats.

If space and the publication venue permit, we can also cite and version the key R packages that manipulated or visualized the data such as tidyverse, ggplot2, broom, tidybayes/ggdist, etc. Be generous. We do want to credit the tools we used to get our results after all!

Where to get citation information

Creators of scientific software will often tell users how to cite their software. Scientific software tools often have an associated article that announces the software and describes how to use it, so authors will ask users to cite that publication so they can obtain academic credit for their software work.

For R and R packages, the citation() function will tell users how to cite their software. lme4 is one of those packages that directs users to a publication.

citation("lme4")
#> To cite lme4 in publications use:
#> 
#>   Douglas Bates, Martin Maechler, Ben Bolker, Steve Walker (2015).
#>   Fitting Linear Mixed-Effects Models Using lme4. Journal of
#>   Statistical Software, 67(1), 1-48. doi:10.18637/jss.v067.i01.
#> 
#> A BibTeX entry for LaTeX users is
#> 
#>   @Article{,
#>     title = {Fitting Linear Mixed-Effects Models Using {lme4}},
#>     author = {Douglas Bates and Martin M{\"a}chler and Ben Bolker and Steve Walker},
#>     journal = {Journal of Statistical Software},
#>     year = {2015},
#>     volume = {67},
#>     number = {1},
#>     pages = {1--48},
#>     doi = {10.18637/jss.v067.i01},
#>   }

Notice in the BibTeX entry at the bottom how {lme4} is put in braces. These braces tell LaTeX not to change the capitalization of that word when printing the title. Some journals or formats have different preferences for how to capitalize titles, but as a general rule of thumb, software titles need to be printed verbatim, or as they would be used by the user. (library(Lme4) will not load the lme4 package). When creating bibliography entries, take care to follow the capitalization so that the software name is accurate. Take care also to differentiate between statistical methods and software names: “We fit GAMLSS models with the gamlss package”.

For CRAN packages, the output of citation() is also provided online in HTML. The CRAN package description page (e.g., lme4) includes a Citation entry which generates a formatted version of the citation information (e.g., lme4 citation info).

When the software doesn’t have a publication, R will generate a citation for you. The ordinal package is one such example.

citation("ordinal")
#> To cite 'ordinal' in publications use:
#> 
#>   Christensen R (2023). _ordinal-Regression Models for Ordinal Data_. R
#>   package version 2023.12-4,
#>   <https://CRAN.R-project.org/package=ordinal>.
#> 
#> A BibTeX entry for LaTeX users is
#> 
#>   @Manual{,
#>     title = {ordinal---Regression Models for Ordinal Data},
#>     author = {Rune H. B. Christensen},
#>     year = {2023},
#>     note = {R package version 2023.12-4},
#>     url = {https://CRAN.R-project.org/package=ordinal},
#>   }

The underscores _ in the title indicate that the title would be italicized when the citation is viewed on CRAN.

How to cite and version R and R packages

As a rule of thumb, any citation of any resource should answer these questions:

  • Who (authors)
  • What (title and sometimes format)
  • When (year)
  • Where (journal, URL, book, DOI)

Then for software, we can add the following:

  • Which (version)

The citation() will answer these questions for you.

There are a couple of other functions to know when it comes to package versions. utils::packageVersion() provides the package version as a string:

utils::packageVersion("lme4")
#> [1] '1.1.35.3'
utils::packageVersion("ordinal")
#> [1] '2023.12.4'

For the current R version, a bunch of built-in functions can tell you everything you need to know. I can never remember which of these functions I want (it’s getRversion()), so I will sometimes use utils::packageVersion("base") to get a simple version number.

R.version.string
#> [1] "R version 4.3.3 (2024-02-29 ucrt)"
R.version
#>                _                                
#> platform       x86_64-w64-mingw32               
#> arch           x86_64                           
#> os             mingw32                          
#> crt            ucrt                             
#> system         x86_64, mingw32                  
#> status                                          
#> major          4                                
#> minor          3.3                              
#> year           2024                             
#> month          02                               
#> day            29                               
#> svn rev        86002                            
#> language       R                                
#> version.string R version 4.3.3 (2024-02-29 ucrt)
#> nickname       Angel Food Cake
getRversion()
#> [1] '4.3.3'

utils::packageVersion("base")
#> [1] '4.3.3'

For Stan, depending on the backend used, the software version is available via:

# rstanarm and default brms
rstan::stan_version()
#> [1] "2.32.2"

# non-default for brms
cmdstanr::cmdstan_version()
#> [1] "2.34.1"

Examples

A simple example of R, a modeling R package and a helper R package:

Analyses were carried out the R programming language (vers. 4.2.0, R Core Team, 2021). Mixed models were estimated using the lme4 package (vers. 1.1.28, Bates et al., 2015). We estimated marginal means and contrasts using the emmeans package (vers. 1.7.2, Lenth, 2021).

Below is the actual RMarkdown content, so that version numbers and citations are inlined automatically. (We’re omitting details on creating .bib files or using pandoc’s @ citations.)

```{r}
v_lme4 <- packageVersion("lme4")
v_r <- packageVersion("base")
v_emmeans <- packageVersion("emmeans")
```

Analyses were carried out the R programming language [vers. `r v_r`,
@rstats]. Mixed models were estimated using the lme4 package
[vers. `r v_lme4`, @lme4]. We estimated marginal means and contrasts
using the emmeans package [vers. `r v_emmeans`, @emmeans].

Here is a more involved example involving an additional language and an R package that interfaces to that language:

We estimated the models using Stan (vers. 2.27.0, Carpenter et al., 2017) via the brms package (vers. 2.16.1, Bürkner, 2017) and tidybayes package (vers. 3.0.4, Kay, 2021) in R (vers. 4.3.0, R Core Team, 2021).

Behind the scenes, I had written the following RMarkdown:

```{r}
model <- targets::tar_read(model_random_slope)
v_stan <- model$version$cmdstan
v_brms <- model$version$brms
v_tidybayes <- packageVersion("tidybayes")
v_r <- getRversion()
```

We estimated the models using Stan [vers. `r v_stan`, @stan] via the
brms package [vers. `r v_brms`, @brms-jss] and tidybayes package
[vers. `r v_tidybayes`, @R-tidybayes] in R [vers. `r v_r`, @r-base].

Notice that I am reading in a cached model object (targets::tar_read()) and reading the software versions from that object. This arrangement avoids problems where models are fitted with one version of a package but utils::packageVersion() returns a different, more recent package version. brms stored these versions automatically for me. In general, when I cache a model like this, I store the package version in the model object.


Last knitted on 2024-05-03. Source code on GitHub.1

  1. .session_info
    #> ─ Session info ───────────────────────────────────────────────────────────────
    #>  setting         value
    #>  version         R version 4.3.3 (2024-02-29 ucrt)
    #>  os              Windows 11 x64 (build 22631)
    #>  system          x86_64, mingw32
    #>  ui              RTerm
    #>  language        (EN)
    #>  collate         English_United States.utf8
    #>  ctype           English_United States.utf8
    #>  tz              America/Chicago
    #>  date            2024-05-03
    #>  pandoc          NA
    #>  stan (rstan)    2.32.2
    #>  stan (cmdstanr) 2.34.1
    #> 
    #> ─ Packages ───────────────────────────────────────────────────────────────────
    #>  ! package        * version  date (UTC) lib source
    #>    abind            1.4-5    2016-07-21 [1] CRAN (R 4.3.0)
    #>    backports        1.4.1    2021-12-13 [1] CRAN (R 4.3.0)
    #>    cachem           1.0.8    2023-05-01 [1] CRAN (R 4.3.0)
    #>    checkmate        2.3.1    2023-12-04 [1] CRAN (R 4.3.3)
    #>    cli              3.6.2    2023-12-11 [1] CRAN (R 4.3.3)
    #>    cmdstanr         0.7.1    2024-03-29 [1] local
    #>    codetools        0.2-19   2023-02-01 [2] CRAN (R 4.3.3)
    #>    colorspace       2.1-0    2023-01-23 [1] CRAN (R 4.3.0)
    #>    curl             5.2.1    2024-03-01 [1] CRAN (R 4.3.3)
    #>    distributional   0.4.0    2024-02-07 [1] CRAN (R 4.3.3)
    #>    downlit          0.4.3    2023-06-29 [1] CRAN (R 4.3.2)
    #>    dplyr          * 1.1.4    2023-11-17 [1] CRAN (R 4.3.2)
    #>    evaluate         0.23     2023-11-01 [1] CRAN (R 4.3.2)
    #>    fansi            1.0.6    2023-12-08 [1] CRAN (R 4.3.3)
    #>    fastmap          1.1.1    2023-02-24 [1] CRAN (R 4.3.0)
    #>    forcats        * 1.0.0    2023-01-29 [1] CRAN (R 4.3.0)
    #>    generics         0.1.3    2022-07-05 [1] CRAN (R 4.3.0)
    #>    ggplot2        * 3.5.1    2024-04-23 [1] CRAN (R 4.3.3)
    #>    git2r            0.33.0   2023-11-26 [1] CRAN (R 4.3.2)
    #>    glue             1.7.0    2024-01-09 [1] CRAN (R 4.3.3)
    #>    gridExtra        2.3      2017-09-09 [1] CRAN (R 4.3.0)
    #>    gtable           0.3.5    2024-04-22 [1] CRAN (R 4.3.3)
    #>    here             1.0.1    2020-12-13 [1] CRAN (R 4.3.0)
    #>    hms              1.1.3    2023-03-21 [1] CRAN (R 4.3.0)
    #>    inline           0.3.19   2021-05-31 [1] CRAN (R 4.3.0)
    #>    jsonlite         1.8.8    2023-12-04 [1] CRAN (R 4.3.3)
    #>    knitr          * 1.46     2024-04-06 [1] CRAN (R 4.3.3)
    #>    lifecycle        1.0.4    2023-11-07 [1] CRAN (R 4.3.2)
    #>    loo              2.7.0    2024-02-24 [1] CRAN (R 4.3.3)
    #>    lubridate      * 1.9.3    2023-09-27 [1] CRAN (R 4.3.1)
    #>    magrittr         2.0.3    2022-03-30 [1] CRAN (R 4.3.0)
    #>    matrixStats      1.3.0    2024-04-11 [1] CRAN (R 4.3.3)
    #>    memoise          2.0.1    2021-11-26 [1] CRAN (R 4.3.0)
    #>    munsell          0.5.1    2024-04-01 [1] CRAN (R 4.3.3)
    #>    pillar           1.9.0    2023-03-22 [1] CRAN (R 4.3.0)
    #>    pkgbuild         1.4.4    2024-03-17 [1] CRAN (R 4.3.3)
    #>    pkgconfig        2.0.3    2019-09-22 [1] CRAN (R 4.3.0)
    #>    posterior        1.5.0    2023-10-31 [1] CRAN (R 4.3.2)
    #>    processx         3.8.4    2024-03-16 [1] CRAN (R 4.3.3)
    #>    ps               1.7.6    2024-01-18 [1] CRAN (R 4.3.3)
    #>    purrr          * 1.0.2    2023-08-10 [1] CRAN (R 4.3.1)
    #>    QuickJSR         1.1.3    2024-01-31 [1] CRAN (R 4.3.3)
    #>    R6               2.5.1    2021-08-19 [1] CRAN (R 4.3.0)
    #>    ragg             1.3.0    2024-03-13 [1] CRAN (R 4.3.3)
    #>    Rcpp             1.0.12   2024-01-09 [1] CRAN (R 4.3.3)
    #>  D RcppParallel     5.1.7    2023-02-27 [1] CRAN (R 4.3.0)
    #>    readr          * 2.1.5    2024-01-10 [1] CRAN (R 4.3.3)
    #>    rlang            1.1.3    2024-01-10 [1] CRAN (R 4.3.3)
    #>    rprojroot        2.0.4    2023-11-05 [1] CRAN (R 4.3.2)
    #>    rstan            2.32.6   2024-03-05 [1] CRAN (R 4.3.3)
    #>    rstudioapi       0.16.0   2024-03-24 [1] CRAN (R 4.3.3)
    #>    scales           1.3.0    2023-11-28 [1] CRAN (R 4.3.2)
    #>    sessioninfo      1.2.2    2021-12-06 [1] CRAN (R 4.3.0)
    #>    StanHeaders      2.32.6   2024-03-01 [1] CRAN (R 4.3.3)
    #>    stringi          1.8.3    2023-12-11 [1] CRAN (R 4.3.2)
    #>    stringr        * 1.5.1    2023-11-14 [1] CRAN (R 4.3.2)
    #>    systemfonts      1.0.6    2024-03-07 [1] CRAN (R 4.3.3)
    #>    tensorA          0.36.2.1 2023-12-13 [1] CRAN (R 4.3.2)
    #>    textshaping      0.3.7    2023-10-09 [1] CRAN (R 4.3.1)
    #>    tibble         * 3.2.1    2023-03-20 [1] CRAN (R 4.3.0)
    #>    tidyr          * 1.3.1    2024-01-24 [1] CRAN (R 4.3.3)
    #>    tidyselect       1.2.1    2024-03-11 [1] CRAN (R 4.3.3)
    #>    tidyverse      * 2.0.0    2023-02-22 [1] CRAN (R 4.3.0)
    #>    timechange       0.3.0    2024-01-18 [1] CRAN (R 4.3.3)
    #>    tzdb             0.4.0    2023-05-12 [1] CRAN (R 4.3.0)
    #>    utf8             1.2.4    2023-10-22 [1] CRAN (R 4.3.1)
    #>    V8               4.4.2    2024-02-15 [1] CRAN (R 4.3.3)
    #>    vctrs            0.6.5    2023-12-01 [1] CRAN (R 4.3.3)
    #>    withr            3.0.0    2024-01-16 [1] CRAN (R 4.3.2)
    #>    xfun             0.43     2024-03-25 [1] CRAN (R 4.3.3)
    #>    yaml             2.3.8    2023-12-11 [1] CRAN (R 4.3.2)
    #> 
    #>  [1] C:/Users/Tristan/AppData/Local/R/win-library/4.3
    #>  [2] C:/Program Files/R/R-4.3.3/library
    #> 
    #>  D ── DLL MD5 mismatch, broken installation.
    #> 
    #> ──────────────────────────────────────────────────────────────────────────────
    

To leave a comment for the author, please follow the link and comment on their blog: Higher Order Functions.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)