Installing the additional R packages in Oracle Big Data Lite VM 4.4.0

[This article was first published on R – Nodalpoint, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In the just-released version 4.4.0 of Oracle Big Data Lite VM, as in the previous one (4.3.0.1), there is a rather large number of additional R packages to be installed by the provided script install_additional_packages.sh, i.e. 28 packages without counting their dependencies (the respective number in version 4.2.1 was only 10).

Unfortunately, what has also changed is the form of the commands issued for installing these additional packages. Consider for example the package igraph; while in the previous VM versions, the command in the script was

Rscript --verbose -e 'install.packages("igraph",repos="http://cran.us.r-project.org",dependencies=TRUE,lib="/u01/app/oracle/product/12.1.0.2/dbhome_1/R/library")'

the respective command now is

Rscript --verbose -e 'install.packages("http://cran.r-project.org/src/contrib/igraph_1.0.1.tar.gz",repos=NULL,dependencies=TRUE,lib="/u01/app/oracle/product/12.1.0.2/dbhome_1/R/library",type="source")'

i.e.  the packages are now referenced down to specific file names (including versions), with the argument repos being now NULL instead of "http://cran.us.r-project.org".

I suspect that the reason for this change is a kind of version control for the packages to be installed, since in the past there have been some issues, mainly due to the fact that Oracle R Distribution (ORD), lagging behind the latest version of GNU R, was sometimes incompatible with the latest versions of some R packages (package arules was such an example). Nevertheless, the fact that now ORD is in version 3.2.0 seems to have not been taken into account here: in the VM, ORD still ships with an older version of the package arules (1.1-9), despite the fact that the latest arules version (1.3-1 at the time of writing) is indeed supported by R 3.2.0. This, in turn, has implications on the dependent packages – in this case arulesViz, which depends on arules and it is included in the additional packages to be installed.

Anyway, whatever the reason, the net result of this change is that the majority of the 28 packages simply fail to install, for two different reasons. Some, like package igraph, fail to install due to missing dependencies:

[oracle@bigdatalite scripts]$ Rscript --verbose -e 'install.packages("http://cran.r-project.org/src/contrib/igraph_1.0.1.tar.gz",repos=NULL,dependencies=TRUE,lib="/u01/app/oracle/product/12.1.0.2/dbhome_1/R/library",type="source")'
running
  '/usr/lib64/R/bin/R --slave --no-restore -e install.packages("http://cran.r-project.org/src/contrib/igraph_1.0.1.tar.gz",repos=NULL,dependencies=TRUE,lib="/u01/app/oracle/product/12.1.0.2/dbhome_1/R/library",type="source")'

trying URL 'http://cran.r-project.org/src/contrib/igraph_1.0.1.tar.gz'
Content type 'application/x-gzip' length 3328353 bytes (3.2 MB)
==================================================
downloaded 3.2 MB

ERROR: dependencies ‘magrittr’, ‘NMF’, ‘irlba’ are not available for package ‘igraph’
* removing ‘/u01/app/oracle/product/12.1.0.2/dbhome_1/R/library/igraph’
Warning message:
In install.packages("http://cran.r-project.org/src/contrib/igraph_1.0.1.tar.gz",  :
  installation of package ‘/tmp/Rtmp6BgT76/downloaded_packages/igraph_1.0.1.tar.gz’ had non-zero exit status

while some others, like arulesViz, fail due to a wrong URL provided (it should be https://cran.r-project.org/src/contrib/Archive/arulesViz/arulesViz_1.0-4.tar.gz, since 1.0-4 is not the current version):

[oracle@bigdatalite scripts]$ Rscript --verbose -e 'install.packages("http://cran.r-project.org/src/contrib/arulesViz_1.0-4.tar.gz",repos=NULL,dependencies=TRUE,lib="/u01/app/oracle/product/12.1.0.2/dbhome_1/R/library",type="source")'
running
  '/usr/lib64/R/bin/R --slave --no-restore -e install.packages("http://cran.r-project.org/src/contrib/arulesViz_1.0-4.tar.gz",repos=NULL,dependencies=TRUE,lib="/u01/app/oracle/product/12.1.0.2/dbhome_1/R/library",type="source")'

trying URL 'http://cran.r-project.org/src/contrib/arulesViz_1.0-4.tar.gz'
Error in download.file(p, destfile, method, mode = "wb", ...) : 
  cannot open URL 'http://cran.r-project.org/src/contrib/arulesViz_1.0-4.tar.gz'
In addition: Warning message:
In download.file(p, destfile, method, mode = "wb", ...) :
  cannot open: HTTP status was '404 Not Found'

Why the missing dependencies? Well, it is simply due to the repos=NULL argument in the used install.packages() commands, which deactivates the dependencies argument, as clearly mentioned in the documentation of install.packages():

dependencies 	logical indicating whether to also install uninstalled packages which 
                these packages depend on/link to/import/suggest (and so on recursively). 
                Not used if repos = NULL.

Here is a detailed table with the installation results for all 28 additional packages, along with the respective reason when installation fails (packages with an asterisk are already pre-installed, but they are included in the script nevertheless) – only 7 out of 28 packages are indeed successfully installed (i.e. the ones where the requested version is the latest one and have no dependencies):

# Package Installed successfully? Reason (if no) Latest CRAN version compatible with ORD 3.2?
1 igraph NO Missing dependencies Yes
2 arulesViz NO Wrong URL (404) Yes (requires update of arules)
3 tseries NO Missing dependencies Yes
4 fracdiff YES Yes
5 Rcpp NO Wrong URL (404) Yes
6 RcppArmadillo NO Wrong URL (404) Yes
7 nnet* NO Wrong URL (404) Yes
8 colorspace YES Yes
9 timeDate YES Yes
10 forecast NO Missing dependencies Yes
11 sandwich NO Missing dependencies Yes
12 gmm NO Missing dependencies Yes
13 kernlab NO Wrong URL (404) Yes
14 nlme* NO Wrong URL (404) Yes
15 minqa NO Missing dependencies Yes
16 nloptr YES Yes
17 RcppEigen NO Wrong URL (404) Yes
18 lme4 NO Wrong URL (404) Yes
19 glmnet NO Missing dependencies Yes
20 RSNNS NO Missing dependencies Yes
21 neuralnet YES Yes
22 NeuralNetTools NO Wrong URL (404) Yes
23 assertthat YES Yes
24 R6 NO Wrong URL (404) Yes
25 lazyeval YES Yes
26 BH NO Wrong URL (404) Yes
27 dplyr NO Missing dependencies Yes
28 tidyr NO Wrong URL (404) Yes

I have included the rightmost column in order to highlight the argument I made earlier: now, with ORD in version 3.2, there is no need for such tight version control of the additional packages (of course I only assume that this is the reason for the particular format of install.packages() used here), and we can safely install the latest package versions available at CRAN. Hence, here is a way finally for installing the additional packages (we have included arules to update it in its latest version, which is required by arulesViz; also, we have omit Rcpp, since it will be installed as a dependency of the other packages); first, save the following R script in the ~/scripts directory (name it additional_packages.R):


pkgs =  c("arules",
          "igraph",
          "arulesViz",
          "tseries",
          "fracdiff",
          "RcppArmadillo",
          "nnet",
          "colorspace",
          "timeDate",
          "forecast",
          "sandwich",
          "gmm",
          "kernlab",
          "nlme",
          "minqa",
          "nloptr",
          "RcppEigen",
          "lme4",
          "glmnet",
          "RSNNS",
          "neuralnet",
          "NeuralNetTools",
          "assertthat",
          "R6",
          "lazyeval",
          "BH",
          "dplyr",
          "tidyr")

install.packages(pkgs, dependencies=TRUE, 
                 repos="http://cran.us.r-project.org",
                 lib="/u01/app/oracle/product/12.1.0.2/dbhome_1/R/library",
                 type="source")

Then, in the same folder, save the following bash script as additional_packages.sh:

echo Configuring JAVA Environment for R
sudo R CMD javareconf

echo Installing additional packages
Rscript --verbose 'additional_packages.R'

and make it executable with chmod +x additional_packages.sh.

There is a certain advantage in installing all necessary packages in a single command as in our code above, instead of issuing separate Rscript commands for each package: this way, all dependencies are handled globally, i.e. a package like Rcpp, which is a dependency of more than one package, will only be downloaded and installed once (instead of once for every package of which is a dependency).

Here is a part of the output when running the bash script:

Warning: dependencies ‘graph’, ‘Rgraphviz’, ‘pbkrtest’ are not available
also installing the dependencies ‘memoise’, ‘xtable’, ‘gtools’, ‘gdata’, ‘SparseM’, ‘MatrixModels’, ‘mime’, ‘optextras’, ‘bitops’, ‘whisker’, ‘rstudioapi’, ‘git2r’, ‘withr’, ‘curl’, ‘openssl’, ‘digest’, ‘crayon’, ‘praise’, ‘pkgmaker’, ‘registry’, ‘rngtools’, ‘stringr’, ‘gridBase’, ‘RColorBrewer’, ‘doParallel’, ‘plyr’, ‘munsell’, ‘labeling’, ‘TSP’, ‘qap’, ‘gclus’, ‘gplots’, ‘fma’, ‘expsmooth’, ‘quantreg’, ‘Formula’, ‘latticeExtra’, ‘acepack’, ‘gtable’, ‘gridExtra’, ‘evaluate’, ‘formatR’, ‘highr’, ‘markdown’, ‘yaml’, ‘ucminf’, ‘BB’, ‘Rcgmin’, ‘Rvmmin’, ‘setRNG’, ‘dfoptim’, ‘svUnit’, ‘iterators’, ‘htmltools’, ‘caTools’, ‘chron’, ‘jsonlite’, ‘rex’, ‘devtools’, ‘httr’, ‘pmml’, ‘XML’, ‘testthat’, ‘magrittr’, ‘NMF’, ‘irlba’, ‘igraphdata’, ‘rgl’, ‘ape’, ‘scales’, ‘scatterplot3d’, ‘vcd’, ‘seriation’, ‘iplots’, ‘quadprog’, ‘zoo’, ‘its’, ‘longmemo’, ‘urca’, ‘Rcpp’, ‘RUnit’, ‘pkgKitten’, ‘mvtnorm’, ‘dichromat’, ‘date’, ‘fpp’, ‘car’, ‘lmtest’, ‘strucchange’, ‘AER’, ‘stabledist’, ‘timeSeries’, ‘Hmisc’, ‘inline’, ‘knitr’, ‘PKPDmodels’, ‘MEMSS’, ‘ggplot2’, ‘mlmRev’, ‘optimx’, ‘gamm4’, ‘HSAUR2’, ‘numDeriv’, ‘foreach’, ‘lars’, ‘reshape2’, ‘caret’, ‘microbenchmark’, ‘pryr’, ‘rmarkdown’, ‘RSQLite’, ‘RMySQL’, ‘RPostgreSQL’, ‘data.table’, ‘Lahman’, ‘nycflights13’, ‘stringi’, ‘covr’, ‘gapminder’
[...]
Warning messages:
1: In install.packages(pkgs, dependencies = TRUE, repos = "http://cran.us.r-project.org",  :
  installation of package ‘car’ had non-zero exit status
2: In install.packages(pkgs, dependencies = TRUE, repos = "http://cran.us.r-project.org",  :
  installation of package ‘AER’ had non-zero exit status
3: In install.packages(pkgs, dependencies = TRUE, repos = "http://cran.us.r-project.org",  :
  installation of package ‘caret’ had non-zero exit status

From the three packages reported as “not available”, graph and Rgraphviz reside in Bioconductor and not in CRAN, hence it is natural for the installation script to not be able to locate them; they are merely suggested by arulesViz, so their absence is not critical (the interested reader can always install them from the Bioconductor repo, following the instructions posted there).

The third not available package, pbkrtest, is the only one (out of about 150 packages we have just installed, including the dependencies) that indeed requires an R version later than ours (3.2.3), and it is also the root cause for the installation failure of car, AER, and caret – none of which is included in our initial package list.-

The post Installing the additional R packages in Oracle Big Data Lite VM 4.4.0 appeared first on Nodalpoint.

To leave a comment for the author, please follow the link and comment on their blog: R – Nodalpoint.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)