The Oracle Big Data Lite (BDLite) VM is a handy and convenient platform for testing, development, and training on the related tools and technologies, such as Cloudera Hadoop, Oracle NoSQL database, Oracle SQL Developer & Data Modeler etc. Among other things, it includes a full distribution of the Oracle R Enterprise (ORE) and the Oracle R Connectors for Hadoop (ORCH). The current version at the time of writing is 4.1, and it can be downloaded from here.
Experimenting with ORE and ORCH is one of our major motivations for using the BDLite VM, since we have already been using the open source (GNU) R as our main working tool for some time now. Naturally, the first thing one might try after setting up the VM, is to install a number of R packages that are necessary for various data science jobs. One of these, and actually an important “infrastructure” R package, is rJava, the package that provides a low-level interface between R and Java, which is already used by several other R packages as a dependency (see the CRAN page on rJava).
Installing the rJava package in the ORE distribution of the BDLite VM, running Oracle Enterprise Linux, proved to be anything but straightforward; this, despite the fact that a) instructions are provided by Oracle (see Using rJava in Embedded R Execution), which unfortunately fail, and b) in one of the setup scripts accompanying the VM (install_additional_packages.sh), found in the ~/scripts directory and mentioned in the ‘Start Here’ document, there is implicitly the requirement for installing rJava, as the cautious user might notice during the script execution. Nevertheless, the attempt in the install_additional_packages.sh script is also unsuccessful. Let’s examine closer the situation (we assume that, as it is the case with Oracle BDLite VM, Java is already installed).
Both Oracle’s instructions and the installation script begin with the line
sudo R CMD javareconf
aiming to provide the necessary configuration for the full Java support to R. Indeed, if one searches the dozens of pages at Stack Overflow and similar fora, trying to provide workarounds for installing rJava in various platforms and operation systems, one will see that several proposed solutions are based on some variation of this command. But as we will see immediately, it is of no use here; the script will try to install rJava as a dependency for arulesViz, a package for the visualization of association rules. Focusing on the specific script command, we get
[[email protected] ~]$ Rscript --verbose -e 'install.packages("arulesViz",repos="http://cran.us.r-project.org",dependencies=TRUE,lib="/u01/app/oracle/product/184.108.40.206/dbhome_1/R/library")' […] also installing the dependencies ‘rJava’, ‘iplots’ […] * installing *source* package ‘rJava’ ... […] configure: error: One or more Java configuration variables are not set. Make sure R is configured with full Java support (including JDK). Run R CMD javareconf as root to add Java support to R. If you don't have root privileges, run R CMD javareconf -e to set all Java-related variables and then install rJava. ERROR: configuration failed for package ‘rJava’
Recall that the
R CMD javareconf command was issued in the beginning of the script with superuser privileges; repeating it from the command line does not resolve the issue. We tried the second recommendation included in the error message, i.e.
[[email protected] scripts]$ R CMD javareconf -e
only to get a different error message, further down during the installation process:
** testing if installed package can be loaded Error : .onLoad failed in loadNamespace() for 'rJava', details: call: dyn.load(file, DLLpath = DLLpath, ...) error: unable to load shared object '/u01/app/oracle/product/220.127.116.11/dbhome_1/R/library/rJava/libs/rJava.so': libjvm.so: cannot open shared object file: No such file or directory Error: loading failed Execution halted ERROR: loading failed
The fact that the file libjvm.so does exist in the appropriate directory is of no help for further progress.
What is happening?
It turns out that the solution is to run the
R CMD javareconf command inside the home directory of R, with superuser privileges [UPDATE: it seems that adding an
-E flag, i.e.
sudo -E R CMD javareconf, also suffices]:
[[email protected] ~]$ echo $R_HOME /usr/lib64/R [[email protected] ~]$ cd $R_HOME [[email protected] R]$ su [[email protected] R]# R CMD javareconf […] Updating Java configuration in /usr/lib64/R Done. [[email protected] R]# Rscript --verbose -e 'install.packages("rJava",repos="http://cran.us.r-project.org",dependencies=TRUE,lib="/u01/app/oracle/product/18.104.22.168/dbhome_1/R/library")' […] * DONE (rJava)
Notice that you should also run the
Rscript command as root – otherwise it will again not work. Alternatively, you could load R (always as root) and use the native R command
install.packages(“rJava”) (essentially, this is exactly what the
Rscript command above performs).
Some further minor issues
Why the arulesViz package tries to also install rJava when, as it is described in CRAN, it does not depend on it? It takes a little search in the relevant pages in CRAN, to uncover that the arulesViz package merely suggests iplots, which in turn depends on rJava. The
dependencies=TRUE argument in the package installation function means that all four kinds of dependency (“Depends”, “Imports”, “LinkingTo”, “Suggests”) will be used during installation. This is not always a good idea, and we might want to limit the dependencies only to the “hard” ones (“Depends”, “Imports”), i.e. the ones absolutely necessary for smooth functioning of the respective packages. This removes also other inconsistencies, such as “soft” dependencies on packages that reside in the Bioconductor repository and are not included in CRAN, such as graph and Rgraphviz in our case here (which also, unsurprisingly, fail to install during the script execution):
Warning: dependencies ‘Rgraphviz’, ‘graph’ are not available
Another issue that the cautious user might have caught, is that the installation of the igraph package is taking place twice during the script execution. Why is that? Well, again, we have to pay attention to the package dependencies: igraph is a (hard) dependency of the arulesViz package; hence, since the
install.packages() function is imperative, and does not check for the possible existence of the packages about to be installed, it will again install the igraph package as a dependency for arulesViz, despite the fact that igraph has just been installed on its own in the previous line of the script!
The natural solution here would be to completely remove the installation of the igraph package itself, and leave it to be installed as a hard dependency of the arulesViz package.
As with our two previous posts on Cloudera Manager and Oracle R Enterprise, we have briefly touched on some issues raised during the initialization of the Oracle BDLite VM. We stress that the rJava package is an important one in the R ecosystem, hence our provided solution has a merit of its own, independently of the attempted installation of the package in the provided install_additional_packages.sh script. We would strongly recommend that the rJava package comes pre-installed with Oracle R Enterprise in future versions of the BDLite VM.