Dockerise and deploy your own R Archive Repo

June 7, 2018
By

(This article was first published on The R Task Force, and kindly contributed to R-bloggers)

There are severals reasons you would want to deploy your own R archive repo: you don’t want to rely on GitHub for your dev packages, you want to use a more “confidential” way, or maybe (and that’s good enough a reason), you’re a nerd and you like the idea of hosting your own repo. So, here’s how to.

What’s a repo?

An R archive network / repo is a URL (unique resource locator) where you can download packages from. For example, when you do :

install.packages("attempt")

There is an argument called “repos”, which is defining the spot on the internet where I want R to go and get the package. By default, you don’t have to specify that argument, as it is defined as : getOption("repos"). For example, right now, on my laptop, I have:

getOption("repos")
## CRAN
## "https://cran.rstudio.com/"
## attr(,"RStudio")
## [1] TRUE

Which indicates that when I try to install a package, R will go an look on the mirror of the CRAN hosted at RStudio. But I could specify any other endpoint:

install.packages(pkgs = "attempt", repos = "http://mirror.fcaglp.unlp.edu.ar/CRAN/", type = "source")

Here, I’m installing {attempt} from Argentina.

What’s in a RAN?

About install.packages

So, how does this work? What does install.packages do when it is called?

We will not dive in the precise details, but let’s sum up:

  • install.packages goes to the url, and looks for “url/src/contrib”
  • in this folder, R looks for a file called PACKAGES
  • R parses this file, isolate the pkgs elements, add the necessary elements for the download (version number and other things…)
  • R download and install the package

It’s “that” simple: if your endpoint has a “src/contrib” folder, if inside this folder there is a PACKAGES file well filled, and if all the tar.gz are there too, you can install.packages(pkgs = "mypkg", repos = "myrepo", type = "source").

The PACKAGES file

In this file, you’ll need to have an entry for each package in your repo. Each one should be described as:

Package: craneur # The name of your package
Version: 0.0.0.9000 # The version
Imports: attempt, desc, glue, R6, tools # The Imports
Suggests: testthat # The suggests
License: MIT + file LICENSE # The licence
MD5sum: e3ef1ff3d829c040c9bafb960fb8630b # The MD5sum
NeedsCompilation: no # Wether or not your package needs compilation

With {craneur}

Doing this by hand can be cumbersome, so I’ve developped this little package to do this automatically, called {craneur}, that you can get with:

remotes::install_github("ColinFay/craneur")

Here’s how to use it:

library(craneur)
colin <- Craneur$new("Colin")
colin$add_package("../craneur_0.0.0.9000.tar.gz")
colin$add_package("../jekyllthat_0.0.0.9000.tar.gz")
colin$add_package("../tidystringdist_0.1.2.tar.gz")
colin$add_package("../attempt_0.2.1.tar.gz")
colin$add_package("../rpinterest_0.4.0.tar.gz")
colin$add_package("../rgeoapi_1.2.0.tar.gz")
colin$add_package("../proustr_0.3.0.9000.tar.gz")
colin$add_package("../languagelayeR_1.2.3.tar.gz")
colin$add_package("../fryingpane_0.0.0.9000.tar.gz")
colin$add_package("../dockerfiler_0.1.1.tar.gz")
colin$add_package("../devaddins_0.0.0.9000.tar.gz")
colin
## package path
## 1 craneur ../craneur_0.0.0.9000.tar.gz
## 2 jekyllthat ../jekyllthat_0.0.0.9000.tar.gz
## 3 tidystringdist ../tidystringdist_0.1.2.tar.gz
## 4 attempt ../attempt_0.2.1.tar.gz
## 5 rpinterest ../rpinterest_0.4.0.tar.gz
## 6 rgeoapi ../rgeoapi_1.2.0.tar.gz
## 7 proustr ../proustr_0.3.0.9000.tar.gz
## 8 languagelayeR ../languagelayeR_1.2.3.tar.gz
## 9 fryingpane ../fryingpane_0.0.0.9000.tar.gz
## 10 dockerfiler ../dockerfiler_0.1.1.tar.gz
## 11 devaddins ../devaddins_0.0.0.9000.tar.gz

You can then save it with:

colin$write()

You now have a folder you can copy and paste on your server. This server can be your own ftp, a university server, a git repo… anywhere you can point to with a url!

Note: there are other packages that can do this, also. Notably {drat}{cranlike} or {packrat}.

Creating a server

With Digital Ocean

For the sake of this article, I’ll use a server deployed on Digital Ocean. If you want to try DO, here’s a 10$ coupon (full disclosure: it’s an affiliated link, and I’ll get a 10$ credit if ever you spend 25 there).

As this is not a DO deployment tuto, I’ll skip this part and assume you succeeded to install a server (roughly, it’s juste “create a droplet with ubuntu”, and access with ssh using the password you receive by mail). You can still refer to the doc if you need more info about how to deploy a droplet.

So, I’ve launched my DO server throught ssh (with the password received via email), and installed Docker, following this tutorial.

I now have a digital ocean machine with Docker on it.

The Dockerfile

Let’s write the Dockerfile for our RAN. Basically, we’ll need

  • a webserver — which will be launched with the {servr} package (let’s keep the project R-only)
  • the ran repo I created earlier

This simple Dockerfile would create a RAN:

library(dockerfiler)
dock <- Dockerfile$new()
dock$RUN("mkdir usr/ran/src/contrib/ -p")
dock$COPY("src/contrib", "usr/ran/src/contrib")
dock$RUN("Rscript -e 'install.packages(\"httpuv\", repos = \"https://cran.rstudio.com/\")'")
dock$RUN("Rscript -e 'install.packages(\"jsonlite\", repos = \"https://cran.rstudio.com/\")'")
dock$RUN("Rscript -e 'install.packages(\"servr\", repos = \"https://cran.rstudio.com/\")'")
dock$EXPOSE(8000)
dock$CMD("Rscript -e 'servr::httd(\"usr/ran/src/contrib\", host = \"0.0.0.0\", port = 8000)'")
dock
FROM rocker/r-base
RUN mkdir usr/ran/src/contrib/ -p
COPY src/contrib usr/ran/src/contrib
RUN Rscript -e 'install.packages("httpuv", repos = "https://cran.rstudio.com/")'
RUN Rscript -e 'install.packages("jsonlite", repos = "https://cran.rstudio.com/")'
RUN Rscript -e 'install.packages("servr", repos = "https://cran.rstudio.com/")'
EXPOSE 8000
CMD Rscript -e 'servr::httd("usr/ran/src/contrib", host = "0.0.0.0", port = 8000)'

But there is a thing that’s missing: what if I want to regenerate a RAN everytime I have a new package? Well, let’s write a different Dockerfile to do that.

A updatable Dockerfile

  • First of all, I’ll copy all the packages sources in a pkg folder
pkg <- list.files("../", pattern = "tar.gz", full.names = TRUE)
file.copy(pkg, "pkg")
list.files("pkg")
## [1] "attempt_0.2.1.tar.gz" "craneur_0.0.0.9000.tar.gz"
## [3] "devaddins_0.0.0.9000.tar.gz" "dockerfiler_0.1.1.tar.gz"
## [5] "fryingpane_0.0.0.9000.tar.gz" "jekyllthat_0.0.0.9000.tar.gz"
## [7] "languagelayeR_1.2.3.tar.gz" "prenoms_0.1.0.tar.gz"
## [9] "proustr_0.3.0.9000.tar.gz" "rgeoapi_1.2.0.tar.gz"
## [11] "rpinterest_0.4.0.tar.gz" "tidystringdist_0.1.2.tar.gz"
  • I’ll then create a craneur.R (file.create("craneur.R")) to automatically launch and write with {craneur} from a folder. It will contain the following code:
library(craneur)
colin <- Craneur$new("Colin")
lapply(list.files("usr/pkg", pattern = "tar.gz", full.names = TRUE), function(x) colin$add_package(x))
colin$write(path = "usr/ran")
  • As I want the user to be able to do http://url only, and as my RAN index is in src/contrib, I’ll create an html that simply does the redirection:
file.create("index.html")

with in it: 

  • And here is the new Dockerfile:
dock <- Dockerfile$new()
# Install the packages
dock$RUN("Rscript -e 'install.packages(\"httpuv\", repos = \"https://cran.rstudio.com/\")'")
dock$RUN("Rscript -e 'install.packages(\"jsonlite\", repos = \"https://cran.rstudio.com/\")'")
dock$RUN("Rscript -e 'install.packages(\"servr\", repos = \"https://cran.rstudio.com/\")'")
dock$RUN("Rscript -e 'install.packages(\"remotes\", repos = \"https://cran.rstudio.com/\")'")
dock$RUN("Rscript -e 'remotes::install_github(\"ColinFay/craneur\")'")
# Create the dir
dock$RUN("mkdir usr/ran -p")
dock$RUN("mkdir usr/pkg -p")
# Move some stuffs
dock$COPY("craneur.R", "usr/pkg/craneur.R")
dock$COPY("pkg", "usr/pkg")
# Copy the index.html
dock$COPY("index.html", "usr/ran/index.html")
# Create the folders
dock$RUN("Rscript usr/pkg/craneur.R")
# Open port
dock$EXPOSE(8000)
# Launch server
dock$CMD("Rscript -e 'servr::httd(\"usr/ran/\", host = \"0.0.0.0\", port = 8000)'")
dock
FROM rocker/r-base
RUN Rscript -e 'install.packages("httpuv", repos = "https://cran.rstudio.com/")'
RUN Rscript -e 'install.packages("jsonlite", repos = "https://cran.rstudio.com/")'
RUN Rscript -e 'install.packages("servr", repos = "https://cran.rstudio.com/")'
RUN Rscript -e 'install.packages("remotes", repos = "https://cran.rstudio.com/")'
RUN Rscript -e 'remotes::install_github("ColinFay/craneur")'
RUN mkdir usr/ran -p
RUN mkdir usr/pkg -p
COPY craneur.R usr/pkg/craneur.R
COPY pkg usr/pkg
COPY index.html usr/ran/index.html
RUN Rscript usr/pkg/craneur.R
EXPOSE 8000
CMD Rscript -e 'servr::httd("usr/ran/", host = "0.0.0.0", port = 8000)'
dock$write()

So here, if I build it:

docker build -t ran .

And:

docker run -d -p 80:8000 ran

I can go to http://127.0.0.1/ on my browser, and I’ll get the index of all available packages.

I can now try:

install.packages("attempt", repos = "http://127.0.0.1/", type = "source")

And that works as expected ?

To the server and beyond

Let’s copy everything on our server in our ran folder:

scp torun.R [email protected]:/usr/ran/
scp craneur.R [email protected]:/usr/ran/
scp Dockerfile [email protected]28.254:/usr/ran/
scp -r pkg/ [email protected]:/usr/ran/
scp index.html [email protected]:/usr/ran/

Let’s go to our virtual machine, and run the Dockerfile with the code we’ve just seen.

docker run -d -p 80:8000 ran

And tadaaa : http://206.189.28.254.

So you can now install from your server:

install.packages("attempt", repos = "http://206.189.28.254", type = "source")

Update your server

So now, the good thing here is that I can update my package server if ever I remove or add a new tar.gz : I’ll just have to rebuild my Docker image.

Further work

Efficient update

Here, to be really efficient, I should split my Docker images in two: one with all the packages, and one with the {craneur} generation : that way, I wouldn’t have to recompile my docker image from scratch everytime I have a modification in the package list.

DNS

A http://206.189.28.254 is not that nice an adress to share or remember, so we could buy a domain and point it to our server. But… that’s for another day ?

The post Dockerise and deploy your own R Archive Repo appeared first on The R Task Force.

To leave a comment for the author, please follow the link and comment on their blog: The R Task Force.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)