Minimum R version dependency in R packages

[This article was first published on Posts on R-hub blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

There have been much talk and many blog posts about R package dependencies. Yet, one special dependency is more rarely mentioned, even though all packages include it: the dependency on R itself. The same way you can specify a dependency on a package, and optionally on a specific version, you can add a dependency to a minimum R version in the DESCRIPTION file of your package. In this post we shall explain why and how.

How & why to declare a dependency to a minimum R version?

Although the R project is in a stable state, and prides itself in its solid backward compatibility, it is far from being a dead project. Many exciting new features keep being regularly added to R or some of its base libraries. As a package developer, you may want to use one of these newly added features (such as startsWith(), introduced in R 3.3.0).

In this situation, you should inform users (as well as automated checks from CRAN) that your package only works for R versions more recent than a given number 1.

To do so, you should add the required version number to your DESCRIPTION file 2:

  Depends:
    R (>= 3.5.0)

Which minimum R version your package should depend on?

There are different strategies to choose on which R version your package should depend:

Conservative approach

Some projects prefer to limit the minimum R version by design, rather than by necessity. This means that their packages might work with older R versions, but because they don’t or can’t test it, they’d rather not take the risk and limit themselves to versions for which they are sure the package is working:

‘Wide net’ approach

On the opposite, other projects consider that packages are by default compatible with all R versions, until they explicitly add a feature associated with a new R version, or until tests prove it otherwise. This is the new policy of usethis (and therefore, of all packages built this usethis). By default, new packages don’t have any constraints on the R version. It is the responsibility of the developer to add a minimum required version if necessary.

Transitive approach

Another approach is to look at your package dependencies. If indirectly, via one of its recursive dependencies, your package already depend on a recent R version, there is no point in going the extra mile to keep working with older versions. So, a strategy could be to compute your package transitive minimum R version with the following function and decide that you can use base R features up to this version:

find_transitive_minR <- function(package) {
  
  db <- tools::CRAN_package_db()
  
  recursive_deps <- tools::package_dependencies(
    package, 
    recursive = TRUE, 
    db = db
  )[[1]]
  
  # These code chunks are detailed below in the 'Minimum R dependencies in CRAN 
  # packages' section
  r_deps <- db |> 
    dplyr::filter(Package %in% recursive_deps) |> 
    # We exclude recommended pkgs as they're always shown as depending on R-devel
    dplyr::filter(is.na(Priority) | Priority != "recommended") |>  
    dplyr::pull(Depends) |> 
    strsplit(split = ",") |> 
    purrr::map(~ grep("^R ", .x, value = TRUE)) |> 
    unlist()
  
  r_vers <- trimws(gsub("^R \\(>=?\\s(.+)\\)", "\\1", r_deps))
  
  return(max(package_version(r_vers)))
}

Let’s try this on ggplot2, which depends on R >= 3.3

find_transitive_minR("ggplot2")[1] '3.4'

This means that ggplot2 developers could, at no cost, start using features from R 3.4.

However, you should take this as a guideline but not add a transitive minimum R version as the minimum R version of your package unless you add a feature specific to this version. It is important that the minimum R version you state in your package reflects the version required for the code in your package, not in one of its dependencies.

Which approach should you choose?

There is no intrinsically better choice between these approaches. It is more a matter of world-view and relation of the project with the users.

However, you should always keep in mind that it may be difficult for users to install or update any piece of software and you should not force them to upgrade to very recent R versions. A good philosophy is to consider that users cannot upgrade their R version and that you should bump the required R version only when you are sure that all active users are already using this R version or a newer one.

Minimum R dependencies in CRAN packages

Whenever you are unsure about a completely subjective choice for a R package, or any project in general, it is often good practice to look at what is done in your community.

Let’s start by grabbing a snapshot of the current CRAN archive:

db <- tools::CRAN_package_db()

We can then isolate the R version dependency declaration:

r_deps <- db |> 
  # We exclude recommended pkgs as they're always shown as depending on R-devel
  dplyr::filter(is.na(Priority) | Priority != "recommended") |> 
  dplyr::pull(Depends) |> 
  strsplit(split = ",") |> 
  purrr::map(~ grep("^R ", .x, value = TRUE)) |> 
  unlist()
length(r_deps)[1] 11542
tail(r_deps)[1] "R (>= 3.5)"    "R (>= 3.1.0)"  "R (>= 2.4.0)"  "R (>= 3.2)"   
[5] "R (>= 3.0.0)"  "R (>= 2.13.0)"

A first result of our analysis if that 62% of CRAN packages specify a minimum R version.

As mentioned earlier, the minimum required version can be specified with a loose or strict inequality:

(r_deps_strict <- sum(grepl("^R \\(>\\s(.+)\\)", r_deps)))[1] 10
(r_deps_loose  <- sum(grepl("^R \\(>=\\s(.+)\\)", r_deps)))[1] 11532

You can see that using a strict inequality is indeed very uncommon (0.09% of the cases).

We can now continue our analysis and extract the version number itself:

r_deps_ver <- trimws(gsub("^R \\(>=?\\s(.+)\\)", "\\1", r_deps))

r_deps_ver |> 
  table()r_deps_ver
       0.65        0.99         1.1      1.14.0         1.4       1.4.0 
          1           2           1           1           7           2 
      1.4.1       1.5.0       1.6.0       1.6.1       1.6.2         1.7 
          1           6           1           1           1           1 
      1.7.0       1.8.0       1.9.0       1.9.1         2.0       2.0.0 
          3          31          11           1          18          59 
      2.0.1        2.01         2.1       2.1.0       2.1.1      2.1.14 
         13           9           2          16           4           1 
      2.1.4       2.1.5        2.10      2.10.0      2.10.1        2.11 
          1           1        1578         136          22           1 
     2.11.0      2.11.1        2.12      2.12.0      2.12.1        2.13 
         13          10           6          45           1           8 
     2.13.0      2.13.1      2.13.2        2.14      2.14.0      2.14.1 
         37           4           1          38         105          22 
     2.14.2        2.15      2.15.0      2.15.1      2.15.2      2.15.3 
         13          47          87          60           9           9 
       2.16         2.2       2.2.0       2.2.1       2.2.4        2.20 
          1           2          23           9           1           1 
        2.3       2.3.0       2.3.1      2.3.12       2.3.2         2.4 
          2          13           3           1           1           4 
      2.4.0       2.4.1         2.5       2.5.0       2.5.1       2.5.3 
         24           2           3          30           1           1 
       2.50         2.6       2.6.0       2.6.1       2.6.2         2.7 
          2           9          40           3           2           5 
      2.7.0       2.7.2         2.8       2.8.0       2.8.1         2.9 
         27           1           1          23           2           2 
      2.9.0       2.9.1       2.9.2         3.0       3.0-0       3.0-2 
         28           4           3         236           4           1 
      3.0.0       3.0.1       3.0.2       3.0.3       3.0.4        3.00 
        750          94         231          33           1          19 
     3.00.0         3.1       3.1-0       3.1.0       3.1.1       3.1.2 
          1         182           2         583          97         144 
      3.1.3        3.10      3.10.0         3.2       3.2.0       3.2.1 
         33           2           1         134         397          50 
      3.2.2       3.2.3       3.2.4       3.2.5       3.2.6         3.3 
         95         110          27          29           1         141 
      3.3.0       3.3.1       3.3.2       3.3.3         3.4       3.4.0 
        461          63          39          23         181         566 
      3.4.1       3.4.2       3.4.3       3.4.4         3.5       3.5-0 
         12           5           2           9         339           1 
      3.5.0 3.5.0-4.0.2      3.5.00       3.5.1     3.5.1.0       3.5.2 
       2207           1           1           5           1           2 
      3.5.3        3.50         3.6       3.6.0       3.6.2       3.6.3 
          3           7         191         485           3           3 
       3.60       3.7.0         4.0       4.0.0       4.0.3       4.0.4 
          1           1         194         375           1           1 
      4.0.5        4.00         4.1       4.1-0       4.1.0         4.2 
          1           3          54           1         142           8 
      4.2.0 
         31 

Interestingly, you can notice that some of these version numbers don’t match any actual R release. To confirm this, we can use the rversions package, from R-hub:

setdiff(unique(r_deps_ver), rversions::r_versions()$version) [1] "2.10"        "3.0"         "3.6"         "3.5"         "3.4"        
 [6] "3.2"         "3.00"        "4.1"         "2.14"        "3.1"        
[11] "4.0"         "3.3"         "2.13"        "2.3.2"       "3.1-0"      
[16] "2.0"         "2.5"         "2.15"        "4.00"        "3.0-0"      
[21] "1.7"         "2.7"         "2.01"        "2.6"         "2.20"       
[26] "2.2"         "2.2.4"       "3.50"        "4.2"         "3.10.0"     
[31] "2.11"        "2.9"         "3.7.0"       "3.10"        "2.3"        
[36] "1.4.0"       "2.5.3"       "3.60"        "2.50"        "2.1.4"      
[41] "2.4"         "3.0.4"       "2.1"         "2.12"        "3.5.0-4.0.2"
[46] "3.5.1.0"     "2.8"         "3.00.0"      "2.3.12"      "4.1-0"      
[51] "2.16"        "1.14.0"      "2.1.14"      "3.5-0"       "3.5.00"     
[56] "3.0-2"       "2.1.5"       "3.2.6"      

We can infer the reason for the mismatch for some examples in this list:

  • missing . between version components (for instance 2.01, 2.50, 3.00, 3.60, 4.00)
  • . replaced by - in the patch version number (for instance 3.0-0, 3.0-2, 3.1-0, 3.5-0, 4.1-0) 3.
  • missing patch version number (for instance 2.0, 2.2, 4.3)
  • extra patch version number (for instance 1.4.0)
  • recommended packages depend on a yet-to-be-released R version (4.3)

Note that this values are not syntactically wrong, and it might in some cases be intended by the author. They can be read and understood by the relevant function in base R (in particular, install.packages()), but it is possible they do not correspond to what the package author was expecting, or trying to communicate. For example, in the case of R (=> 3.60): even if the author really intended to depend on R 3.6.0 as we assume here, the package cannot be installed in versions earlier than 4.0.0.

To visualise the actual minimum R version corresponding to the declared R dependency, we can do the following:

r_vers <- package_version(rversions::r_versions()$version)

normalised_r_deps <- vapply(r_deps_ver, function(ver) {
  
  ver <- package_version(ver)
  
  # Here, we rely on a somewhat uncommon use of `match()`. When `match()`ing
  # `TRUE` to something, the index of the first `TRUE` value will be returned.
  # In other words here, we return the first R version that it superior or equal
  # to the stated R version dependency
  min_r_ver <- r_vers[match(TRUE, ver <= r_vers)]
  
  return(min_r_ver)
  
}, package_version("0.0.0"))
library(ggplot2)

do.call(rbind, normalised_r_deps) |> 
  as.data.frame() |> 
  dplyr::rename(
    major = V1,
    minor = V2,
    patch = V3
  ) |> 
  dplyr::mutate(majorminor = paste(major, minor, sep = ".")) |> 
  ggplot(aes(y = majorminor)) +
    geom_bar() +
    labs(
      x = "Number of CRAN packages",
      y = "Minimum required R version",
      title = "Minimum required R version in CRAN packages"
    ) +
    theme_minimal()
     

The peak at R 2.10 might be related to the fact that it is automatically added when developers embed data in their packages with usethis::use_date(). You can also notice at peak at R 3.5.0. It is possible that this is linked to the change in the serialization format used by R. Data objects embedded in packages developed with R >= 3.5.0 are by default only compatible with R >= 3.5.0. However, these are nothing more than educated guesses and only a proper, in-depth, analysis could confirm what made developers switch to a newer R version. This analysis could look at diffs between package versions and see what new R feature packages are using when they bump the R version dependency.

How to avoid depending on a new version?

For the various reasons presented above, it might not always be desirable to depend on a very recent R version. In this kind of situation, you may want to use the backports package. It reimplements many of the new features from the more recent R version. This way, instead of having to depend on a newer R version, you can simply add a dependency to backports, which is easier to install than a newer R version for users in highly controlled environments.

Backports is not a silver bullet though, as some new features are impossible to reimplement in a package. Notably, this is the case of the native R pipe (|>), introduced in R 4.1.0. Roughly speaking, this is because it is not simply a new function, but rather an entire new way to read R code.

How to test you depend on the correct version?

It is easy to make a mistake when specifying a minimum R version, and to forget to you use one recent R feature. For this reason, you should always try to verify that your minimum R version claim is accurate.

The most complete approach is to run your tests, or at least verify that the package can be built without errors, on all older R versions you claim to support. For this, locally, you could use rig, which allows you to install multiple R version on your computer and switch between them with a single command. But a convenient way to do so if to rely on continuous integration platforms, where existing workflows are already set up to run on multiple R versions. For example, if you choose to replicate the tidyverse policy of supporting the 5 latest minor releases of R, your best bet is probably to use the check-full.yaml GitHub Actions workflow from r-lib/actions 4.

But this extensive test may prove challenging in some cases. In particular, the actions provided by r-lib.actions use rcmdcheck, which itself depends on R 3.3 (via digest). This means that you’ll have to write your own workflows if you wish to run R CMD check on older R versions. Some packages that place a high value in being compatible with older R versions, such as data.table, have taken this route and developed their own continuous integration scripts.

A more lightweight approach (although a little more prone to false-negatives) is to use the backport_linter() function provided by the lintr package. It works by matching your code against a list of functions introduced in more recent R versions. Note that this approach might also produce false positives is you use functions with the same name as recent base R functions.

Conclusion

As you’ve seen, there are quite a lot of strategies and subtleties in setting a minimum R dependency for your package: you could adopt the tidyverse approach of supporting the five last R versions, or choose to keep compatibility with older R versions and using backports if necessary. In all cases, you should try to verify that your declared minimum R version is correct: by using the dedicated linter from the lintr package, or by actually running your tests on older R versions. Whatever you end up doing and even if this topic may seem complex, we believe the tips we presented here are specific cases of more software development tips:

  • use automated tools to assist you in your work;
  • try to empathize with your users and minimize the friction necessary to install and use your tool;
  • look at what other developers in the community are doing.

  1. Note that there is no mechanism to make your package compatible only with older R versions, and not with the more recent ones. Packages are supposed to work with the latest R versions. ↩︎

  2. In theory, it is not strictly required to use >=. You could use a strict inequality (>) but as we will see later, this is a very uncommon option so we recommend you use the de facto community standard and stick to >=. ↩︎

  3. However, it is interesting to note that package_version("3.5-0") == package_version("3.5.0"). The use of - instead of . is purely stylistic. ↩︎

  4. Instead of manually copying this file, you can run usethis::use_github_action("check-full") in your package folder. ↩︎

To leave a comment for the author, please follow the link and comment on their blog: Posts on R-hub blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)