Site icon R-bloggers

Non-standard files/directories, Rbuildignore and inst

[This article was first published on Posts on R-hub blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Paragraphasing Writing R Extensions, an R package is “directory of files which extend R”. These files have to follow a standard structure: you can’t store everything that suits your fancy in a tarball you submit to CRAN. In this post we shall go through what directories and files can go on CRAN and how to navigate this while shipping everything you want to CRAN and keeping some things in the package source only.

Standard, known directory and files

At the moment of writing, what a built package can contain is this list called known1 defined in the R source, in tools/R/check.R.

known <- c("DESCRIPTION", "INDEX", "LICENCE", "LICENSE",
           "LICENCE.note", "LICENSE.note",
           "MD5", "NAMESPACE", "NEWS", "PORTING",
           "COPYING", "COPYING.LIB", "GPL-2", "GPL-3",
           "BUGS", "Bugs",
           "ChangeLog", "Changelog", "CHANGELOG", "CHANGES", "Changes",
           "INSTALL", "README", "THANKS", "TODO", "ToDo",
           "INSTALL.windows",
           "README.md", "NEWS.md",
           "configure", "configure.win", "cleanup", "cleanup.win",
           "configure.ac", "configure.in",
           "datafiles",
           "R", "data", "demo", "exec", "inst", "man",
           "po", "src", "tests", "vignettes",
           "build",       # used by R CMD build
           ".aspell",     # used for spell checking packages
           "java", "tools", "noweb") # common dirs in packages.

In this post, we won’t go into what these directories and files can contain and how they should be formatted, which is another standard. We’ll focus on their mere existence.

Non-standard files

Now, in a package folder, you might have all sorts of different things

If they ended up at the root of the bundled package, R CMD check would complain and tell you

???? Non-standard files/directories found at top level:

Note that sometimes R CMD check could complain about files you don’t see in the source because they are created by the checking process. In that case, take a step back and try to fix your code, e.g. cleaning after yourself if examples create files.

Now, how do you keep the items that sparkle joy in the bundled package and in your package source, without endangering R CMD check passing?

Excluding files from the bundled package

There are files that don’t need to make it into a built package (e.g., your CRAN comments, your pkgdown config.).

R CMD build filtering

To prevent files and folders from making it from the package source to the bundled package, we need to understand how things work: How do files and directories end up, or not, in the tarball/bundled package, from the source package? That’s one job of R CMD build possibly via a wrapper like devtools::build(). It will copy your whole package source and then remove files in three “steps”2

## Check for files listed in .Rbuildignore or get_exclude_patterns()
inRbuildignore <- function(files, pkgdir) {
    exclude <- rep.int(FALSE, length(files))
    ignore <- get_exclude_patterns()
    ## handle .Rbuildignore:
    ## 'These patterns should be Perl regexps, one per line,
    ##  to be matched against the file names relative to
    ##  the top-level source directory.'
    ignore_file <- file.path(pkgdir, ".Rbuildignore")
    if (file.exists(ignore_file))
	ignore <- c(ignore, readLines(ignore_file, warn = FALSE))
    for(e in ignore[nzchar(ignore)])
	exclude <- exclude | grepl(e, files, perl = TRUE,
				ignore.case = TRUE)
    exclude
}
get_exclude_patterns <- function()
    c("^\\.Rbuildignore$",
      "(^|/)\\.DS_Store$",
      "^\\.(RData|Rhistory)$",
      "~$", "\\.bak$", "\\.swp$",
      "(^|/)\\.#[^/]*$", "(^|/)#[^/]*#$",
      ## Outdated ...
      "^TITLE$", "^data/00Index$",
      "^inst/doc/00Index\\.dcf$",
      ## Autoconf
      "^config\\.(cache|log|status)$",
      "(^|/)autom4te\\.cache$", # ncdf4 had this in subdirectory 'tools'
      ## Windows dependency files
      "^src/.*\\.d$", "^src/Makedeps$",
      ## IRIX, of some vintage
      "^src/so_locations$",
      ## Sweave detrius
      "^inst/doc/Rplots\\.(ps|pdf)$"
      )
exclude <- inRbuildignore(allfiles, pkgdir)

isdir <- dir.exists(allfiles)
## old (pre-2.10.0) dirnames
exclude <- exclude | (isdir & (bases %in%
                               c("check", "chm", .vc_dir_names)))
exclude <- exclude | (isdir & grepl("([Oo]ld|\\.Rcheck)$", bases))
## FIXME: GNU make uses GNUmakefile (note capitalization)
exclude <- exclude | bases %in% c("Read-and-delete-me", "GNUMakefile")
## Mac resource forks
exclude <- exclude | startsWith(bases, "._")
exclude <- exclude | (isdir & grepl("^src.*/[.]deps$", allfiles))
## Windows DLL resource file
exclude <- exclude | (allfiles == paste0("src/", pkgname, "_res.rc"))
## inst/doc/.Rinstignore is a mistake
exclude <- exclude | endsWith(allfiles, "inst/doc/.Rinstignore") |
    endsWith(allfiles, "inst/doc/.build.timestamp") |
    endsWith(allfiles, "vignettes/.Rinstignore")
## leftovers
exclude <- exclude | grepl("^.Rbuildindex[.]", allfiles)
        ## or simply?  exclude <- exclude | startsWith(allfiles, ".Rbuildindex.")
        exclude <- exclude | (bases %in% .hidden_file_exclusions)

Of particular interest is .vc_dir_names: had you noticed your .git folder was magically not included in the bundled package?3

## Version control directory names: CVS, .svn (Subversion), .arch-ids
## (arch), .bzr, .git, .hg (mercurial) and _darcs (Darcs)
## And it seems .metadata (eclipse) is in the same category.

.vc_dir_names <-
    c("CVS", ".svn", ".arch-ids", ".bzr", ".git", ".hg", "_darcs", ".metadata")

And .hidden_file_exclusions

## We are told
## .Rproj.user is Rstudio
## .cproject .project .settings are Eclipse
## .exrc is for vi
## .tm_properties is Mac's TextMate
.hidden_file_exclusions <-
    c(".Renviron", ".Rprofile", ".Rproj.user",
      ".Rhistory", ".Rapp.history",
      ".tex", ".log", ".aux", ".pdf", ".png",
      ".backups", ".cvsignore", ".cproject", ".directory",
      ".dropbox", ".exrc", ".gdb.history",
      ".gitattributes", ".gitignore", ".gitmodules",
      ".hgignore", ".hgtags",
      ".htaccess",
      ".latex2html-init",
      ".project", ".seed", ".settings", ".tm_properties")

Note that R CMD build will silently remove files from the bundled package, which is a source of weird errors. For instance, if you wrote a wrong pattern in .Rbuildignore that ends up removing one of your R files, R CMD check will complain about a function not existing and you might be a bit puzzled.

.Rbuildignore

So, if your package source features any file or directory that is not known, not standard, and also not listed in the common exclusions, then you need to add it to .Rbuildignore.

As written in “Writing R extensions”, “To exclude files from being put into the package, one can specify a list of exclude patterns in file .Rbuildignore in the top-level source directory. These patterns should be Perl-like regular expressions (see the help for regexp in R for the precise details), one per line, to be matched case-insensitively against the file and directory names relative to the top-level package source directory.".

Below is knitr .Rbuildignore

.gitignore
tikzDictionary$
aux$
log$
out$
inst/examples/knitr-.*.pdf
inst/examples/child/knitr-.*.pdf
inst/examples/child/knitr-.*\.md
inst/examples/figure
inst/examples/cache
knitr-minimal.md
knitr-spin.md
png$
^\.Rproj\.user$
^.*\.Rproj$
^\.travis\.yml$
FAQ.md
Makefile
^knitr-examples$
^\.github$
^docs$
^README-ES\.md$
^README-PT\.md$
^codecov\.yml$
^NEWS\.md$

How to edit .Rbuildignore?

You could edit .Rbuildignore by hand, from the command line, or using usethis::use_build_ignore() that will escape paths by default. There is also the usethis::edit_r_buildignore() function for creating/opening the user-level or project-level .Rbuildignore.

When to edit .Rbuildignore?

You could edit .Rbuildignore when R CMD check complains, or when creating non-standard files. This is where workflow tools can help. If you e.g. use usethis::use_cran_comments() to create cran-commends.md, it will also add it to .Rbuildignore

Keeping non-standard things in the bundled package

Now you might wonder, how do I package up a Shiny app, a raw data file, etc. if they’re not allowed at the root of a bundled package? Well, easy, keep them but not at the root, ah! More seriously, a good idea is to look at existing practice in recent CRAN packages. Often, you’ll see stuff is stored in inst/: classic elements such as citation information in inst/CITATION4, raw data in inst/extdata/ but also more modern or exotic elements such as RStudio addins.

What about .Rinstignore?

.Rbuildignore has a sibling called .Rinstignore for another use case: “The contents of the inst subdirectory will be copied recursively to the installation directory. Subdirectories of inst should not interfere with those used by R (currently, R, data, demo, exec, libs, man, help, html and Meta, and earlier versions used latex, R-ex). The copying of the inst happens after src is built so its Makefile can create files to be installed. To exclude files from being installed, one can specify a list of exclude patterns in file .Rinstignore in the top-level source directory. These patterns should be Perl-like regular expressions5 (see the help for regexp in R for the precise details), one per line, to be matched case-insensitively against the file and directory paths, e.g. doc/.*[.]png$ will exclude all PNG files in inst/doc based on the extension."

See for instance future.apply .Rinstignore

# Certain LaTeX files (e.g. bib, bst, sty) must be part of the build 
# such that they are available for R CMD check.  These are excluded
# from the install using .Rinstignore in the top-level directory
# such as this one.
doc/.*[.](bib|bst|sty)$

Conclusion

In this post we explained what files and directories can be present in a bundled package. We also explained how to prevent non-standard things from making it from the package source into the bundled package: using .Rbuildignore; and how to let non-standard things make it into the bundled package: inst/ – but don’t make it your junk drawer, of course. Let’s end with a quote from Marie Kondo’s The Life-Changing Magic of Tidying Up

“Keep only those things that speak to your heart.”

… that we need to amend…

“Keep only those things that speak to your R CMD check.”

< section class="footnotes" role="doc-endnotes">
  1. I might have entered a rabbit hole looking through THANKS files on R-hub mirror of CRAN source code. I sure like reading acknowledgements. ???? ↩︎

  2. That procedure can make R CMD build very slow when you have huge hidden directories, refer to this excellent R-package-devel thread. ↩︎

  3. I am fascinated by common exclusions, that reflect what is accepted as common practice. ↩︎

  4. That citation will be found by the citation() function when an user calls it e.g. citation("stplanr"), and by pkgdown when building the website, see stplanr CITATION page that is linked from its homepage. ↩︎

  5. Another file full of Perl regex that is out of scope for this post is .install_extras that influences what makes it (rather than what doesn’t make it) from the vignettes to inst/doc when building the package. ↩︎

To leave a comment for the author, please follow the link and comment on their blog: Posts on R-hub blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.