Installing Package Dependencies without external http(s) requests
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Consider you have a server that is running behind a firewall and, for security reasons, cannot make external http(s) requests. Further, you have R running on this server and you need to install a set of packages. The simple approach of
install.packages("", repo = " ")
is not an option since you will have no access to the CRAN repository.
Another option would be to download the source files (.tar.gz files) form CRAN or BioConductor, transfer those files to the sever via FTP, and then install the packages via
install.packages("This approach will work well, with one big exception, the dependencies of the package may not be on the server. How do you get the source files for all the dependencies of your package? What about the dependencies of the dependencies, and the dependencies of the dependencies of the dependencies? Simply, how do you install R packages on a machine that is not allowed to make external http(s) requests?
Here is how I approached this problem. On my local machine, a machine with internet access, I ran a script (a script that will be shown and explained in detail below) which will download all the dependencies and dependencies of dependencies, etc., from both CRAN and BioConductor, and generate a
makefile
to install the packages in the correct order, i.e., in an order such that the dependencies are met.When the script finished, the source files and the
makefile
can be transfered to the server without external http(s) request authority. Running themakefile
will install the packages, and is an easy way to track and report install errors.We need to define what constitutes a dependency. In a package
DESCRIPTION
file packages listed under the fieldDepends
,Imports
, andLinkingTo
are what we will consider dependencies.Suggests
andEnhances
are omitted as they are not needed for the package to work.Build Dependencies
An R script
build-dep-list.R
has been written and is expected to be evaluated from the command line viaRscript --vanilla build-dep-list.R [pkg1] [pkg2] [...] [pkgn]Where
pkg1
is the name of the first known package to download,pkg2
the second known package to download, …, andpkgn
the nth package to download. The script will download all the dependencies forpkg1
,...
,pkgn
, and the dependencies of the dependencies, and so on. The script will also generate amakefile
to help with the installation of the packages, aiming to get the order of the installs so that the install ofpkg1
,...
,pkgn
will not error.The full script can be found on my github page. The script will be broken up into pieces here with additional detail and explanation.
When I develop scripts that I expect to evaluated in the terminal, I will start the script with a check of
interactive()
. If in an interactive session we’ll have set variables to values needed for testing and development, and if not in an interactive session we’ll use the command line arguments to define the value of the variables. This could also be edited so that the expected evaluation would be done in an interactive session. For then work we will have thecharacter
vectorOUR_PACKAGES
to store the names of the packages we want/need to install.if (interactive()) { OUR_PACKAGES <- c("graph", "gRbase", "gRain", "jsonlite", "plotly", "SHELF", "rjson", "svglite", "magrittr") } else { OUR_PACKAGES <- commandArgs(trailingOnly = TRUE) }We also need to define the repositories which we will query for the packages. We’ll use RStudio’s CRAN mirror and the repository for BioConductor.
# Repositories to look for packages CRAN <- "https://cran.rstudio.com/" BIOC <- "https://bioconductor.org/packages/release/bioc/"Now, let’s look into the packages. Packages are classified into three priority classes, “base”, “recommended”, and “NA”. The “base” packages are standard an R installation, and the ‘recommended’ are in any standard installation of R. All other packages have
Priority == NA
.ipkgs <- utils::installed.packages() ipkgs[ipkgs[, "Priority"] %in% "base", "Package"] ## base compiler datasets graphics grDevices grid ## "base" "compiler" "datasets" "graphics" "grDevices" "grid" ## methods parallel splines stats stats4 tcltk ## "methods" "parallel" "splines" "stats" "stats4" "tcltk" ## tools utils ## "tools" "utils" ipkgs[ipkgs[, "Priority"] %in% "recommended", "Package"] ## boot class cluster codetools foreign ## "boot" "class" "cluster" "codetools" "foreign" ## KernSmooth lattice MASS Matrix mgcv ## "KernSmooth" "lattice" "MASS" "Matrix" "mgcv" ## nnet rpart spatial survival ## "nnet" "rpart" "spatial" "survival"Some packages will have dependencies on the “base” and/or “recommended” packages. We will need to know these packages and omit them form the packages we will need to download and install.
base_pkgs <- unname(utils::installed.packages()[utils::installed.packages()[, "Priority"] %in% c("base", "recommended"), "Package"])Next step, get a list of the available packages from CRAN and BioConductor. The return from
available.packages
is a matrix with all the information we will need about the packages.available_pkgs <- available.packages(repos = c(CRAN, BIOC)) str(available_pkgs) ## chr [1:13659, 1:17] "A3" "abbyyR" "abc" "abc.data" "ABC.RAP" ... ## - attr(*, "dimnames")=List of 2 ## ..$ : chr [1:13659] "A3" "abbyyR" "abc" "abc.data" ... ## ..$ : chr [1:17] "Package" "Version" "Priority" "Depends" ... available_pkgs[available_pkgs[, "Package"] %in% OUR_PACKAGES, c("Package", "Version", "Depends", "Imports", "LinkingTo", "Repository")] ## Package Version ## gRain "gRain" "1.3-0" ## gRbase "gRbase" "1.8-3" ## jsonlite "jsonlite" "1.5" ## magrittr "magrittr" "1.5" ## plotly "plotly" "4.7.1" ## rjson "rjson" "0.2.15" ## SHELF "SHELF" "1.3.0" ## svglite "svglite" "1.2.1" ## graph "graph" "1.56.0" ## Depends ## gRain "R (>= 3.0.2), methods, gRbase (>= 1.7-2)" ## gRbase "R (>= 3.0.2), methods" ## jsonlite "methods" ## magrittr NA ## plotly "R (>= 3.2.0), ggplot2 (>= 2.2.1)" ## rjson "R (>= 3.1.0)" ## SHELF "R (>= 3.3.1)" ## svglite "R (>= 3.0.0)" ## graph "R (>= 2.10), methods, BiocGenerics (>= 0.13.11)" ## Imports ## gRain "igraph, graph, magrittr, functional, Rcpp (>= 0.11.1)" ## gRbase "graph, igraph, magrittr, Matrix, RBGL, Rcpp (>= 0.11.1)" ## jsonlite NA ## magrittr NA ## plotly "tools, scales, httr, jsonlite, magrittr, digest, viridisLite,\nbase64enc, htmltools, htmlwidgets (>= 0.9), tidyr, hexbin,\nRColorBrewer, dplyr, tibble, lazyeval (>= 0.2.0), crosstalk,\npurrr, data.table" ## rjson NA ## SHELF "ggplot2, grid, shiny, stats, graphics, tidyr, MASS, ggExtra" ## svglite "Rcpp, gdtools (>= 0.1.6)" ## graph "stats, stats4, utils" ## LinkingTo ## gRain "Rcpp (>= 0.11.1), RcppArmadillo, RcppEigen, gRbase (>=\n1.8-0)" ## gRbase "Rcpp (>= 0.11.1), RcppArmadillo, RcppEigen" ## jsonlite NA ## magrittr NA ## plotly NA ## rjson NA ## SHELF NA ## svglite "Rcpp, gdtools, BH" ## graph NA ## Repository ## gRain "https://cran.rstudio.com/src/contrib" ## gRbase "https://cran.rstudio.com/src/contrib" ## jsonlite "https://cran.rstudio.com/src/contrib" ## magrittr "https://cran.rstudio.com/src/contrib" ## plotly "https://cran.rstudio.com/src/contrib" ## rjson "https://cran.rstudio.com/src/contrib" ## SHELF "https://cran.rstudio.com/src/contrib" ## svglite "https://cran.rstudio.com/src/contrib" ## graph "https://bioconductor.org/packages/release/bioc/src/contrib"In this example we see that the packages listed in
OUR_PACKAGES
except thegraph
package can be downloaded from CRAN.graph
and at least one dependencies,BiocGenerics
will need to be downloaded from BioConductor.The next step in building the list of dependencies and a script for installing them is done in the following
while
loop. We start with a character vectorpkgs_to_download
which is initially equivalent toOUR_PACKAGES
. We will iterate through this vector, appending the dependencies in order. Use thetools::package_dependencies
function to generate a list of the packages dependencies, and dependencies of dependencies, and so on.In the
while
loop we get a list of the dependencies for a package, stored in thedeps
object. We will omit any of the base and recommended packages from thedeps
object and then appenddeps
to thepkgs_to_download
vector in the position immediately to the right of the current package being looked up. When the indexeri
is incremented, the next package to be considered will be the first dependency. This process continues until all the dependencies have been explored. Lastly, we reverse the order of the elements ofpkgs_to_download
so that we have the packages listed in a useful install order, i.e.,pkgs_to_download[1]
should be installed beforepkgs_to_download[2]
, etc. After reversing the order of the elements ofpkgs_to_download
we look only at the unique elements. By default, the first occurrence of an element will be keep and the repeated elements will be omitted. By reversing the order then taking the unique values, the deepest level of dependency will be retained for a specific package.pkgs_to_download <- OUR_PACKAGES i <- 1L while(i <= length(pkgs_to_download)) { deps <- unlist(tools::package_dependencies(packages = pkgs_to_download[i], which = c("Depends", "Imports", "LinkingTo"), db = available_pkgs, recursive = FALSE), use.names = FALSE) deps <- deps[!(deps %in% base_pkgs)] pkgs_to_download <- append(pkgs_to_download, deps, i) i <- i + 1L } pkgs_to_download <- unique(rev(pkgs_to_download))If you are having a difficult time envisioning what the above does, let’s look at and example for the
dplyr
package. In this example we’ll print out the list of dependencies at each step through the while loop. Note that packages such asRcpp
will be assessed multiple times, but the final list will only haveRcpp
listed once.dplyr_dependencies <- "dplyr" i <- 1L while(i <= length(dplyr_dependencies)) { cat("\ni =", i, "\nLooking up dependencies for", dplyr_dependencies[i], "\n") deps <- unlist(tools::package_dependencies(packages = dplyr_dependencies[i], which = c("Depends", "Imports", "LinkingTo"), db = available_pkgs, recursive = FALSE), use.names = FALSE) deps <- deps[!(deps %in% base_pkgs)] dplyr_dependencies <- append(dplyr_dependencies, deps, i) cat(dplyr_dependencies[i], "dependencies:", paste(deps, collapse = ", "), "\ndplyr_dependencies =", paste(dplyr_dependencies, collapse = ", "), "\n") i <- i + 1L } ## ## i = 1 ## Looking up dependencies for dplyr ## dplyr dependencies: assertthat, bindrcpp, glue, magrittr, pkgconfig, rlang, R6, Rcpp, tibble, BH, plogr ## dplyr_dependencies = dplyr, assertthat, bindrcpp, glue, magrittr, pkgconfig, rlang, R6, Rcpp, tibble, BH, plogr ## ## i = 2 ## Looking up dependencies for assertthat ## assertthat dependencies: ## dplyr_dependencies = dplyr, assertthat, bindrcpp, glue, magrittr, pkgconfig, rlang, R6, Rcpp, tibble, BH, plogr ## ## i = 3 ## Looking up dependencies for bindrcpp ## bindrcpp dependencies: Rcpp, bindr, plogr ## dplyr_dependencies = dplyr, assertthat, bindrcpp, Rcpp, bindr, plogr, glue, magrittr, pkgconfig, rlang, R6, Rcpp, tibble, BH, plogr ## ## i = 4 ## Looking up dependencies for Rcpp ## Rcpp dependencies: ## dplyr_dependencies = dplyr, assertthat, bindrcpp, Rcpp, bindr, plogr, glue, magrittr, pkgconfig, rlang, R6, Rcpp, tibble, BH, plogr ## ## i = 5 ## Looking up dependencies for bindr ## bindr dependencies: ## dplyr_dependencies = dplyr, assertthat, bindrcpp, Rcpp, bindr, plogr, glue, magrittr, pkgconfig, rlang, R6, Rcpp, tibble, BH, plogr ## ## i = 6 ## Looking up dependencies for plogr ## plogr dependencies: ## dplyr_dependencies = dplyr, assertthat, bindrcpp, Rcpp, bindr, plogr, glue, magrittr, pkgconfig, rlang, R6, Rcpp, tibble, BH, plogr ## ## i = 7 ## Looking up dependencies for glue ## glue dependencies: ## dplyr_dependencies = dplyr, assertthat, bindrcpp, Rcpp, bindr, plogr, glue, magrittr, pkgconfig, rlang, R6, Rcpp, tibble, BH, plogr ## ## i = 8 ## Looking up dependencies for magrittr ## magrittr dependencies: ## dplyr_dependencies = dplyr, assertthat, bindrcpp, Rcpp, bindr, plogr, glue, magrittr, pkgconfig, rlang, R6, Rcpp, tibble, BH, plogr ## ## i = 9 ## Looking up dependencies for pkgconfig ## pkgconfig dependencies: ## dplyr_dependencies = dplyr, assertthat, bindrcpp, Rcpp, bindr, plogr, glue, magrittr, pkgconfig, rlang, R6, Rcpp, tibble, BH, plogr ## ## i = 10 ## Looking up dependencies for rlang ## rlang dependencies: ## dplyr_dependencies = dplyr, assertthat, bindrcpp, Rcpp, bindr, plogr, glue, magrittr, pkgconfig, rlang, R6, Rcpp, tibble, BH, plogr ## ## i = 11 ## Looking up dependencies for R6 ## R6 dependencies: ## dplyr_dependencies = dplyr, assertthat, bindrcpp, Rcpp, bindr, plogr, glue, magrittr, pkgconfig, rlang, R6, Rcpp, tibble, BH, plogr ## ## i = 12 ## Looking up dependencies for Rcpp ## Rcpp dependencies: ## dplyr_dependencies = dplyr, assertthat, bindrcpp, Rcpp, bindr, plogr, glue, magrittr, pkgconfig, rlang, R6, Rcpp, tibble, BH, plogr ## ## i = 13 ## Looking up dependencies for tibble ## tibble dependencies: cli, crayon, pillar, rlang ## dplyr_dependencies = dplyr, assertthat, bindrcpp, Rcpp, bindr, plogr, glue, magrittr, pkgconfig, rlang, R6, Rcpp, tibble, cli, crayon, pillar, rlang, BH, plogr ## ## i = 14 ## Looking up dependencies for cli ## cli dependencies: assertthat, crayon ## dplyr_dependencies = dplyr, assertthat, bindrcpp, Rcpp, bindr, plogr, glue, magrittr, pkgconfig, rlang, R6, Rcpp, tibble, cli, assertthat, crayon, crayon, pillar, rlang, BH, plogr ## ## i = 15 ## Looking up dependencies for assertthat ## assertthat dependencies: ## dplyr_dependencies = dplyr, assertthat, bindrcpp, Rcpp, bindr, plogr, glue, magrittr, pkgconfig, rlang, R6, Rcpp, tibble, cli, assertthat, crayon, crayon, pillar, rlang, BH, plogr ## ## i = 16 ## Looking up dependencies for crayon ## crayon dependencies: ## dplyr_dependencies = dplyr, assertthat, bindrcpp, Rcpp, bindr, plogr, glue, magrittr, pkgconfig, rlang, R6, Rcpp, tibble, cli, assertthat, crayon, crayon, pillar, rlang, BH, plogr ## ## i = 17 ## Looking up dependencies for crayon ## crayon dependencies: ## dplyr_dependencies = dplyr, assertthat, bindrcpp, Rcpp, bindr, plogr, glue, magrittr, pkgconfig, rlang, R6, Rcpp, tibble, cli, assertthat, crayon, crayon, pillar, rlang, BH, plogr ## ## i = 18 ## Looking up dependencies for pillar ## pillar dependencies: cli, crayon, rlang, utf8 ## dplyr_dependencies = dplyr, assertthat, bindrcpp, Rcpp, bindr, plogr, glue, magrittr, pkgconfig, rlang, R6, Rcpp, tibble, cli, assertthat, crayon, crayon, pillar, cli, crayon, rlang, utf8, rlang, BH, plogr ## ## i = 19 ## Looking up dependencies for cli ## cli dependencies: assertthat, crayon ## dplyr_dependencies = dplyr, assertthat, bindrcpp, Rcpp, bindr, plogr, glue, magrittr, pkgconfig, rlang, R6, Rcpp, tibble, cli, assertthat, crayon, crayon, pillar, cli, assertthat, crayon, crayon, rlang, utf8, rlang, BH, plogr ## ## i = 20 ## Looking up dependencies for assertthat ## assertthat dependencies: ## dplyr_dependencies = dplyr, assertthat, bindrcpp, Rcpp, bindr, plogr, glue, magrittr, pkgconfig, rlang, R6, Rcpp, tibble, cli, assertthat, crayon, crayon, pillar, cli, assertthat, crayon, crayon, rlang, utf8, rlang, BH, plogr ## ## i = 21 ## Looking up dependencies for crayon ## crayon dependencies: ## dplyr_dependencies = dplyr, assertthat, bindrcpp, Rcpp, bindr, plogr, glue, magrittr, pkgconfig, rlang, R6, Rcpp, tibble, cli, assertthat, crayon, crayon, pillar, cli, assertthat, crayon, crayon, rlang, utf8, rlang, BH, plogr ## ## i = 22 ## Looking up dependencies for crayon ## crayon dependencies: ## dplyr_dependencies = dplyr, assertthat, bindrcpp, Rcpp, bindr, plogr, glue, magrittr, pkgconfig, rlang, R6, Rcpp, tibble, cli, assertthat, crayon, crayon, pillar, cli, assertthat, crayon, crayon, rlang, utf8, rlang, BH, plogr ## ## i = 23 ## Looking up dependencies for rlang ## rlang dependencies: ## dplyr_dependencies = dplyr, assertthat, bindrcpp, Rcpp, bindr, plogr, glue, magrittr, pkgconfig, rlang, R6, Rcpp, tibble, cli, assertthat, crayon, crayon, pillar, cli, assertthat, crayon, crayon, rlang, utf8, rlang, BH, plogr ## ## i = 24 ## Looking up dependencies for utf8 ## utf8 dependencies: ## dplyr_dependencies = dplyr, assertthat, bindrcpp, Rcpp, bindr, plogr, glue, magrittr, pkgconfig, rlang, R6, Rcpp, tibble, cli, assertthat, crayon, crayon, pillar, cli, assertthat, crayon, crayon, rlang, utf8, rlang, BH, plogr ## ## i = 25 ## Looking up dependencies for rlang ## rlang dependencies: ## dplyr_dependencies = dplyr, assertthat, bindrcpp, Rcpp, bindr, plogr, glue, magrittr, pkgconfig, rlang, R6, Rcpp, tibble, cli, assertthat, crayon, crayon, pillar, cli, assertthat, crayon, crayon, rlang, utf8, rlang, BH, plogr ## ## i = 26 ## Looking up dependencies for BH ## BH dependencies: ## dplyr_dependencies = dplyr, assertthat, bindrcpp, Rcpp, bindr, plogr, glue, magrittr, pkgconfig, rlang, R6, Rcpp, tibble, cli, assertthat, crayon, crayon, pillar, cli, assertthat, crayon, crayon, rlang, utf8, rlang, BH, plogr ## ## i = 27 ## Looking up dependencies for plogr ## plogr dependencies: ## dplyr_dependencies = dplyr, assertthat, bindrcpp, Rcpp, bindr, plogr, glue, magrittr, pkgconfig, rlang, R6, Rcpp, tibble, cli, assertthat, crayon, crayon, pillar, cli, assertthat, crayon, crayon, rlang, utf8, rlang, BH, plogr dplyr_dependencies <- unique(rev(dplyr_dependencies)) dplyr_dependencies ## [1] "plogr" "BH" "rlang" "utf8" "crayon" ## [6] "assertthat" "cli" "pillar" "tibble" "Rcpp" ## [11] "R6" "pkgconfig" "magrittr" "glue" "bindr" ## [16] "bindrcpp" "dplyr"Now that we have
pkgs_to_download
, a character vector of package names that we need to download, we can use thedownload.packages
function to do so. The objectdwnld_pkgs
is a 2 column matrix with the name and file path to the source file for each package.# Download the needed packages into the pkg-source-files directory unlink("pkg-source-files/*") dir.create("pkg-source-files/", showWarnings = FALSE) dwnld_pkgs <- download.packages(pkgs = pkgs_to_download, destdir = "pkg-source-files", repos = c(CRAN, BIOC), type = "source") head(dwnld_pkgs)The last step for the script to run on a machine with external http(s) request authority, is to build a
makefile
to install all the needed packages. I prefer themakefile
over a bash script because of the default error handling that amake
provided compared to a bash script.cat("all:\n", paste0("\tR CMD INSTALL ", dwnld_pkgs[, 2], "\n"), sep = "", file = "makefile")For this example, the first several lines of the
makefile
are:all: R CMD INSTALL pkg-source-files/magrittr_1.5.tar.gz R CMD INSTALL pkg-source-files/BH_1.66.0-1.tar.gz R CMD INSTALL pkg-source-files/withr_2.1.1.tar.gz R CMD INSTALL pkg-source-files/Rcpp_0.12.15.tar.gz R CMD INSTALL pkg-source-files/gdtools_0.1.6.tar.gz R CMD INSTALL pkg-source-files/svglite_1.2.1.tar.gzNote that magrittr is the last package in the
OUR_PACKAGES
object and has no dependencies, thus is the first package installed. The svglite package is the second to last package inOUR_PACKAGES
and it will be installed after the dependencies BH, withr, Rcpp, and gdtools are installed.Installing on the Remote Machine
Now that the source files have been downloaded and the
makefile
generated, move thepkg-source-files
directory and themakefile
to the remote machine and run the makefile. If the makefile fails, there might be some system dependencies that need to be updated.Download the script and/or contribute
The
build-dependency-list.R
file can be found on my github page.To leave a comment for the author, please follow the link and comment on their blog: Peter E. DeWitt.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.