She issued install.packages() — you won’t believe what happened next!

[This article was first published on R on Biofunctor, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Just install.packages(), he said. It’s easy, he said. (Unknown R tutor)

CRAN has spoiled us. Most of the time, it’s a breeze to add new packages to our environment. Just issue the command, and the package becomes available. If you’re an RStudio user, you can simply start typing the package name select it from a list and it’s done. This is because R and CRAN work together to handle your package’s R dependencies: other packages that are needed for the functioning of the package you actually want to use.

But then one beautiful day you get a message:

installation of package 'arrow' had non-zero exit status

Which is a polite way of saying “failed”. Why does this happen? Because some packages depend on non-R packages. Java, python, C libraries: there are many components that are not managed by R/CRAN. If you have a well behaved package, it will tell you what other requirements you have to satisfy. If not, you may find info on various websites when you search for the error message.

Thus we got to the root of the issue: We use many different systems to handle different kinds of software dependencies, and those systems are often isolated from one another.

One possible solution is the excellent r2u, which serves CRAN as Ubuntu binaries. So if you’re using Ubuntu and you don’t need anything other than CRAN and Bioconductor, you’re in luck. r2u will also install whatever system dependencies your package may have. But then, one day you may find yourself needing something more. Perhaps a package from Github or you decide you had enough of Ubuntu’s snaps, and you want to move to a different distro.

But does a complete solution exist? Well, it does, and it’s been around for more than 20 years. It’s called Nix, which is the Dutch word for “nothing”, because it helps you declare your development environment from zero. So you won’t depend on anything other than the nix package manager and the package definitions.

This concept is very successful in solving the complex problem of software dependencies and conflicts through isolation of components. It really works across languages including C, C++ for system dependencies. The repository has more than 5000 contributors who created 85000+ packages, which makes it the largest and most up to date distribution at the moment. This includes the whole CRAN and Bioconductor, and gives us with the tools to include our own R packages from a git repo, if we choose to. We’re in luck.

And here comes the: But, as it is often the case in our world, complex problems don’t always have a simple solution. Nix does it’s best to create abstractions to make our work easier, but it takes quite some effort to learn the Nix language and ecosystem before one writes Nix comfortably. And I mean a lot, because there’s heaps to learn.

Using Nix is much easier though: installing a package, starting a dev shell or running a flake is as easy as issuing the right command. A bit like using git or apt-get. You don’t need to be an expert to do this. But it would help if there was a specialised resource for R users.

So I started a (still experimental) project, ReproducibR, which aims to allow someone with basic command line skills to easily create and use an R environment, which is stable, shareable and fully reproducible.

Oh, did I forget to mention that everything in Nix is reproducible? Because it is. So you could use it to share a project with a remote collaborator, submit to scientific journals, or run your code years from now exactly as if it was today.

To give it a run for it’s money (actually, it’s free), I decided to challenge it with some hard-to-install R packages. These are packages that depend on external libraries that can be difficult to configure properly. So, I packed them in a single environment. If you have Nix installed with flakes enabled, you can give it a try, too, by issuing:

nix flake init -t \
  "gitlab:Kupac/ReproducibR?rev=792b90ae65460b767fcc3b4af8d3dfdf81dce98f&dir=flake_env"
nix develop .#radian

It will take some time to install the packages the first time, but then they are all cached in the Nix store. The next time you start this shell, or any shell using the same flake.lock file, these packages will be just grabbed from the cache instantaneously.

To test these packages, I just had to write their names in a text file, called R_package_list.nix, one package per line. (If you’re an impatient type, do comment out the rstanarm line):

pkgs:
with pkgs.rPackages; [
  arrow
  sf
#   ROracle # non-free
  RJDBC
  rstanarm # Takes a long time to compile but it should work
  altair
]

You may notice that there’s no mention about any dependencies (R or system) in this file. They are handled by the nixpkgs repository and Nix, and we don’t have to think about this for now.

This file is imported by the neighbouring flake.nix file, which does the actual work of wrapping these packages in various IDE-s, like R, radian, rstudio (work in progress), etc. You don’t need to know what’s in there, just add your favourite package in the list, and restart the development shell with nix develop .#radian. The package should install and appear in your environment.

So did the experiment work? Partially, yes, and here’s the breakdown!

{sf} is a package used for spatial analysis and it depends on three system packages GDAL, GEOS and PROJ which are apparently a bit difficult to compile and link. In this case, it seems to have worked like a charm.

[ins] r$> sf::sf_extSoftVersion()
          GEOS           GDAL         proj.4 GDAL_with_GEOS     USE_PROJ_H           PROJ 
      "3.11.2"        "3.7.1"        "9.2.0"         "true"         "true"        "9.2.0" 

I was even able to plot something by following a vignette.

{sf}_test

The {arrow} package is an R front-end to the Apache arrow C++ library, which allows us to read and write tabular data blazingly fast. It seems to be working as well:

[ins] r$> air_table <- arrow_table(airquality)

[ins] r$> air_table
Table
153 rows x 6 columns
$Ozone <int32>
$Solar.R <int32>
$Wind <double>
$Temp <int32>
$Month <int32>
$Day <int32>

The {rstanarm} package is an R interface to the Stan C++ library for Bayesian estimation. Stan takes a long time to compile, and setting up the R package correctly was also a challenge. Nix did not help with the long compile time, but the setup was seamless, adding that one line was enough to install everything. I tested the package with one of the examples from the vignettes.

{rstanarm_test}

The {RJDBC} package is an R DBI interface using JDBC as a back-end. It needs rJava (which in turn needs Java) and apparently it’s also a handful when it comes to installing it. I can tell that it was installed successfully, but I don’t know how to test it, so I can’t comment on the functionality.

{ROracle} did fail to install at first. This package depends on a non-free (as in free speech) library from Oracle. I tried to figure out which libraries were missing, and got close, but couldn’t get it to work. So I submitted an issue, and it got fixed by @jbedo very quickly. Again, I’m not able to test it on an actual database, but here’s some output:

[ins] r$> ROracle::Oracle()
Driver name:             Oracle (OCI) 
Driver version:          1.3-1 
Client version:          21.10.0.0.0 
Connections processed:   0 
Open connections:        0 
Interruptible:           FALSE 
Unicode data as utf8:    TRUE 
Oracle type attributes:  FALSE 

This package is marked as broken in the r2u repo, so it’s a nice achievement by the Nix maintainers to enable the installation. It was not so straight forward to run it though, I had to do NIXPKGS_ALLOW_UNFREE=1 nix develop --impure ./flake_env#radian to get it to work. (The NIXPKGS_ALLOW_UNFREE env var needs to be passed on to the dev shell, which can only be done when run in --impure mode.) Hence, I disabled it in the repo, but feel free to uncomment the line if you want to try it.

I thought, the {altair} package will be a smooth ride. I recall having installed it in the past using nix. Ironically, it’s the only one that failed. Well, installation was OK, but this package to create beautiful interactive plot depends on the altair python library and {reticulate}. I found two problems with this setup, which prevented me to produce any plots.

Firstly, reticulate finds any python binary on your system in mysterious ways. I had some old files left over in ~/.local/share/r-miniconda and it managed to find it somehow. So {reticulate} spoils Nix’s hard work to isolate builds, and it may run very differently on different systems because of this unpredictable behaviour. I see a new pull request coming.

But the {altair} package itself should also depend on the identically named python package, which it does not. So I’m planning to submit another issue.

In conclusion, all the packages installed successfully, but {altair} seems to be broken for now, due to missing dependencies in nixpkgs. The future looks bright.

To leave a comment for the author, please follow the link and comment on their blog: R on Biofunctor.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)