Dockerise and deploy your own R Archive Repo

[This article was first published on The R Task Force, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

There are severals reasons you would want to deploy your own R archive repo: you don’t want to rely on GitHub for your dev packages, you want to use a more “confidential” way, or maybe (and that’s good enough a reason), you’re a nerd and you like the idea of hosting your own repo. So, here’s how to.

What’s a repo?

An R archive network / repo is a URL (unique resource locator) where you can download packages from. For example, when you do :

install.packages("attempt")

There is an argument called “repos”, which is defining the spot on the internet where I want R to go and get the package. By default, you don’t have to specify that argument, as it is defined as : getOption("repos"). For example, right now, on my laptop, I have:

getOption("repos")

## CRAN
## "https://cran.rstudio.com/"
## attr(,"RStudio")
## [1] TRUE

Which indicates that when I try to install a package, R will go an look on the mirror of the CRAN hosted at RStudio. But I could specify any other endpoint:

install.packages(pkgs = "attempt", repos = "http://mirror.fcaglp.unlp.edu.ar/CRAN/", type = "source")

Here, I’m installing {attempt} from Argentina.

What’s in a RAN?

About install.packages

So, how does this work? What does install.packages do when it is called?

We will not dive in the precise details, but let’s sum up:

  • install.packages goes to the url, and looks for “url/src/contrib”
  • in this folder, R looks for a file called PACKAGES
  • R parses this file, isolate the pkgs elements, add the necessary elements for the download (version number and other things…)
  • R download and install the package

It’s “that” simple: if your endpoint has a “src/contrib” folder, if inside this folder there is a PACKAGES file well filled, and if all the tar.gz are there too, you can install.packages(pkgs = "mypkg", repos = "myrepo", type = "source").

The PACKAGES file

In this file, you’ll need to have an entry for each package in your repo. Each one should be described as:

Package: craneur # The name of your package
Version: 0.0.0.9000 # The version
Imports: attempt, desc, glue, R6, tools # The Imports
Suggests: testthat # The suggests
License: MIT + file LICENSE # The licence
MD5sum: e3ef1ff3d829c040c9bafb960fb8630b # The MD5sum
NeedsCompilation: no # Wether or not your package needs compilation

With {craneur}

Doing this by hand can be cumbersome, so I’ve developped this little package to do this automatically, called {craneur}, that you can get with:

remotes::install_github("ColinFay/craneur")

Here’s how to use it:

library(craneur)
colin <- Craneur$new("Colin")
colin$add_package("../craneur_0.0.0.9000.tar.gz")
colin$add_package("../jekyllthat_0.0.0.9000.tar.gz")
colin$add_package("../tidystringdist_0.1.2.tar.gz")
colin$add_package("../attempt_0.2.1.tar.gz")
colin$add_package("../rpinterest_0.4.0.tar.gz")
colin$add_package("../rgeoapi_1.2.0.tar.gz")
colin$add_package("../proustr_0.3.0.9000.tar.gz")
colin$add_package("../languagelayeR_1.2.3.tar.gz")
colin$add_package("../fryingpane_0.0.0.9000.tar.gz")
colin$add_package("../dockerfiler_0.1.1.tar.gz")
colin$add_package("../devaddins_0.0.0.9000.tar.gz")
colin

## package path
## 1 craneur ../craneur_0.0.0.9000.tar.gz
## 2 jekyllthat ../jekyllthat_0.0.0.9000.tar.gz
## 3 tidystringdist ../tidystringdist_0.1.2.tar.gz
## 4 attempt ../attempt_0.2.1.tar.gz
## 5 rpinterest ../rpinterest_0.4.0.tar.gz
## 6 rgeoapi ../rgeoapi_1.2.0.tar.gz
## 7 proustr ../proustr_0.3.0.9000.tar.gz
## 8 languagelayeR ../languagelayeR_1.2.3.tar.gz
## 9 fryingpane ../fryingpane_0.0.0.9000.tar.gz
## 10 dockerfiler ../dockerfiler_0.1.1.tar.gz
## 11 devaddins ../devaddins_0.0.0.9000.tar.gz

You can then save it with:

colin$write()

You now have a folder you can copy and paste on your server. This server can be your own ftp, a university server, a git repo… anywhere you can point to with a url!

Note: there are other packages that can do this, also. Notably {drat}{cranlike} or {packrat}.

Creating a server

With Digital Ocean

For the sake of this article, I’ll use a server deployed on Digital Ocean. If you want to try DO, here’s a 10$ coupon (full disclosure: it’s an affiliated link, and I’ll get a 10$ credit if ever you spend 25 there).

As this is not a DO deployment tuto, I’ll skip this part and assume you succeeded to install a server (roughly, it’s juste “create a droplet with ubuntu”, and access with ssh using the password you receive by mail). You can still refer to the doc if you need more info about how to deploy a droplet.

So, I’ve launched my DO server throught ssh (with the password received via email), and installed Docker, following this tutorial.

I now have a digital ocean machine with Docker on it.

The Dockerfile

Let’s write the Dockerfile for our RAN. Basically, we’ll need

  • a webserver — which will be launched with the {servr} package (let’s keep the project R-only)
  • the ran repo I created earlier

This simple Dockerfile would create a RAN:

library(dockerfiler)
dock <- Dockerfile$new()
dock$RUN("mkdir usr/ran/src/contrib/ -p")
dock$COPY("src/contrib", "usr/ran/src/contrib")
dock$RUN("Rscript -e 'install.packages(\"httpuv\", repos = \"https://cran.rstudio.com/\")'")
dock$RUN("Rscript -e 'install.packages(\"jsonlite\", repos = \"https://cran.rstudio.com/\")'")
dock$RUN("Rscript -e 'install.packages(\"servr\", repos = \"https://cran.rstudio.com/\")'")
dock$EXPOSE(8000)
dock$CMD("Rscript -e 'servr::httd(\"usr/ran/src/contrib\", host = \"0.0.0.0\", port = 8000)'")
dock

FROM rocker/r-base
RUN mkdir usr/ran/src/contrib/ -p
COPY src/contrib usr/ran/src/contrib
RUN Rscript -e 'install.packages("httpuv", repos = "https://cran.rstudio.com/")'
RUN Rscript -e 'install.packages("jsonlite", repos = "https://cran.rstudio.com/")'
RUN Rscript -e 'install.packages("servr", repos = "https://cran.rstudio.com/")'
EXPOSE 8000
CMD Rscript -e 'servr::httd("usr/ran/src/contrib", host = "0.0.0.0", port = 8000)'

But there is a thing that’s missing: what if I want to regenerate a RAN everytime I have a new package? Well, let’s write a different Dockerfile to do that.

A updatable Dockerfile

  • First of all, I’ll copy all the packages sources in a pkg folder
pkg <- list.files("../", pattern = "tar.gz", full.names = TRUE)
file.copy(pkg, "pkg")
list.files("pkg")

## [1] "attempt_0.2.1.tar.gz" "craneur_0.0.0.9000.tar.gz"
## [3] "devaddins_0.0.0.9000.tar.gz" "dockerfiler_0.1.1.tar.gz"
## [5] "fryingpane_0.0.0.9000.tar.gz" "jekyllthat_0.0.0.9000.tar.gz"
## [7] "languagelayeR_1.2.3.tar.gz" "prenoms_0.1.0.tar.gz"
## [9] "proustr_0.3.0.9000.tar.gz" "rgeoapi_1.2.0.tar.gz"
## [11] "rpinterest_0.4.0.tar.gz" "tidystringdist_0.1.2.tar.gz"
  • I’ll then create a craneur.R (file.create("craneur.R")) to automatically launch and write with {craneur} from a folder. It will contain the following code:
library(craneur)
colin <- Craneur$new("Colin")
lapply(list.files("usr/pkg", pattern = "tar.gz", full.names = TRUE), function(x) colin$add_package(x))
colin$write(path = "usr/ran")
  • As I want the user to be able to do http://url only, and as my RAN index is in src/contrib, I’ll create an html that simply does the redirection:
file.create("index.html")

with in it: 

  • And here is the new Dockerfile:
dock <- Dockerfile$new()
# Install the packages
dock$RUN("Rscript -e 'install.packages(\"httpuv\", repos = \"https://cran.rstudio.com/\")'")
dock$RUN("Rscript -e 'install.packages(\"jsonlite\", repos = \"https://cran.rstudio.com/\")'")
dock$RUN("Rscript -e 'install.packages(\"servr\", repos = \"https://cran.rstudio.com/\")'")
dock$RUN("Rscript -e 'install.packages(\"remotes\", repos = \"https://cran.rstudio.com/\")'")
dock$RUN("Rscript -e 'remotes::install_github(\"ColinFay/craneur\")'")
# Create the dir
dock$RUN("mkdir usr/ran -p")
dock$RUN("mkdir usr/pkg -p")
# Move some stuffs
dock$COPY("craneur.R", "usr/pkg/craneur.R")
dock$COPY("pkg", "usr/pkg")
# Copy the index.html
dock$COPY("index.html", "usr/ran/index.html")
# Create the folders
dock$RUN("Rscript usr/pkg/craneur.R")
# Open port
dock$EXPOSE(8000)
# Launch server
dock$CMD("Rscript -e 'servr::httd(\"usr/ran/\", host = \"0.0.0.0\", port = 8000)'")
dock

FROM rocker/r-base
RUN Rscript -e 'install.packages("httpuv", repos = "https://cran.rstudio.com/")'
RUN Rscript -e 'install.packages("jsonlite", repos = "https://cran.rstudio.com/")'
RUN Rscript -e 'install.packages("servr", repos = "https://cran.rstudio.com/")'
RUN Rscript -e 'install.packages("remotes", repos = "https://cran.rstudio.com/")'
RUN Rscript -e 'remotes::install_github("ColinFay/craneur")'
RUN mkdir usr/ran -p
RUN mkdir usr/pkg -p
COPY craneur.R usr/pkg/craneur.R
COPY pkg usr/pkg
COPY index.html usr/ran/index.html
RUN Rscript usr/pkg/craneur.R
EXPOSE 8000
CMD Rscript -e 'servr::httd("usr/ran/", host = "0.0.0.0", port = 8000)'

dock$write()

So here, if I build it:

docker build -t ran .

And:

docker run -d -p 80:8000 ran

I can go to http://127.0.0.1/ on my browser, and I’ll get the index of all available packages.

I can now try:

install.packages("attempt", repos = "http://127.0.0.1/", type = "source")

And that works as expected ?

To the server and beyond

Let’s copy everything on our server in our ran folder:

scp torun.R [email protected]:/usr/ran/
scp craneur.R [email protected]:/usr/ran/
scp Dockerfile [email protected]:/usr/ran/
scp -r pkg/ [email protected]:/usr/ran/
scp index.html [email protected]:/usr/ran/

Let’s go to our virtual machine, and run the Dockerfile with the code we’ve just seen.

docker run -d -p 80:8000 ran

And tadaaa : http://206.189.28.254.

So you can now install from your server:

install.packages("attempt", repos = "http://206.189.28.254", type = "source")

Update your server

So now, the good thing here is that I can update my package server if ever I remove or add a new tar.gz : I’ll just have to rebuild my Docker image.

Further work

Efficient update

Here, to be really efficient, I should split my Docker images in two: one with all the packages, and one with the {craneur} generation : that way, I wouldn’t have to recompile my docker image from scratch everytime I have a modification in the package list.

DNS

A http://206.189.28.254 is not that nice an adress to share or remember, so we could buy a domain and point it to our server. But… that’s for another day ?

The post Dockerise and deploy your own R Archive Repo appeared first on The R Task Force.

To leave a comment for the author, please follow the link and comment on their blog: The R Task Force.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)