Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

There are severals reasons you would want to deploy your own R archive repo: you don’t want to rely on GitHub for your dev packages, you want to use a more “confidential” way, or maybe (and that’s good enough a reason), you’re a nerd and you like the idea of hosting your own repo. So, here’s how to.

## What’s a repo?

An R archive network / repo is a URL (unique resource locator) where you can download packages from. For example, when you do :

install.packages("attempt")


There is an argument called “repos”, which is defining the spot on the internet where I want R to go and get the package. By default, you don’t have to specify that argument, as it is defined as : getOption("repos"). For example, right now, on my laptop, I have:

getOption("repos")

## CRAN
## "https://cran.rstudio.com/"
## attr(,"RStudio")
## [1] TRUE


Which indicates that when I try to install a package, R will go an look on the mirror of the CRAN hosted at RStudio. But I could specify any other endpoint:

install.packages(pkgs = "attempt", repos = "http://mirror.fcaglp.unlp.edu.ar/CRAN/", type = "source")


Here, I’m installing {attempt} from Argentina.

## What’s in a RAN?

### About install.packages

So, how does this work? What does install.packages do when it is called?

We will not dive in the precise details, but let’s sum up:

• install.packages goes to the url, and looks for “url/src/contrib”
• in this folder, R looks for a file called PACKAGES
• R parses this file, isolate the pkgs elements, add the necessary elements for the download (version number and other things…)

It’s “that” simple: if your endpoint has a “src/contrib” folder, if inside this folder there is a PACKAGES file well filled, and if all the tar.gz are there too, you can install.packages(pkgs = "mypkg", repos = "myrepo", type = "source").

### The PACKAGES file

In this file, you’ll need to have an entry for each package in your repo. Each one should be described as:

Package: craneur # The name of your package
Version: 0.0.0.9000 # The version
Imports: attempt, desc, glue, R6, tools # The Imports
Suggests: testthat # The suggests
MD5sum: e3ef1ff3d829c040c9bafb960fb8630b # The MD5sum
NeedsCompilation: no # Wether or not your package needs compilation


### With {craneur}

Doing this by hand can be cumbersome, so I’ve developped this little package to do this automatically, called {craneur}, that you can get with:

remotes::install_github("ColinFay/craneur")


Here’s how to use it:

library(craneur)
colin <- Craneur$new("Colin") colin$add_package("../craneur_0.0.0.9000.tar.gz")
colin$add_package("../jekyllthat_0.0.0.9000.tar.gz") colin$add_package("../tidystringdist_0.1.2.tar.gz")
colin$add_package("../attempt_0.2.1.tar.gz") colin$add_package("../rpinterest_0.4.0.tar.gz")
colin$add_package("../rgeoapi_1.2.0.tar.gz") colin$add_package("../proustr_0.3.0.9000.tar.gz")
colin$add_package("../languagelayeR_1.2.3.tar.gz") colin$add_package("../fryingpane_0.0.0.9000.tar.gz")
colin$add_package("../dockerfiler_0.1.1.tar.gz") colin$add_package("../devaddins_0.0.0.9000.tar.gz")
colin

## package path
## 1 craneur ../craneur_0.0.0.9000.tar.gz
## 2 jekyllthat ../jekyllthat_0.0.0.9000.tar.gz
## 3 tidystringdist ../tidystringdist_0.1.2.tar.gz
## 4 attempt ../attempt_0.2.1.tar.gz
## 5 rpinterest ../rpinterest_0.4.0.tar.gz
## 6 rgeoapi ../rgeoapi_1.2.0.tar.gz
## 7 proustr ../proustr_0.3.0.9000.tar.gz
## 8 languagelayeR ../languagelayeR_1.2.3.tar.gz
## 9 fryingpane ../fryingpane_0.0.0.9000.tar.gz
## 10 dockerfiler ../dockerfiler_0.1.1.tar.gz


You can then save it with:

colin$write()  You now have a folder you can copy and paste on your server. This server can be your own ftp, a university server, a git repo… anywhere you can point to with a url! Note: there are other packages that can do this, also. Notably {drat}{cranlike} or {packrat}. ## Creating a server ### With Digital Ocean For the sake of this article, I’ll use a server deployed on Digital Ocean. If you want to try DO, here’s a 10$ coupon (full disclosure: it’s an affiliated link, and I’ll get a 10$credit if ever you spend 25 there). As this is not a DO deployment tuto, I’ll skip this part and assume you succeeded to install a server (roughly, it’s juste “create a droplet with ubuntu”, and access with ssh using the password you receive by mail). You can still refer to the doc if you need more info about how to deploy a droplet. So, I’ve launched my DO server throught ssh (with the password received via email), and installed Docker, following this tutorial. I now have a digital ocean machine with Docker on it. ### The Dockerfile Let’s write the Dockerfile for our RAN. Basically, we’ll need • a webserver — which will be launched with the {servr} package (let’s keep the project R-only) • the ran repo I created earlier This simple Dockerfile would create a RAN: library(dockerfiler) dock <- Dockerfile$new()
dock$RUN("mkdir usr/ran/src/contrib/ -p") dock$COPY("src/contrib", "usr/ran/src/contrib")
dock$RUN("Rscript -e 'install.packages(\"httpuv\", repos = \"https://cran.rstudio.com/\")'") dock$RUN("Rscript -e 'install.packages(\"jsonlite\", repos = \"https://cran.rstudio.com/\")'")
dock$RUN("Rscript -e 'install.packages(\"servr\", repos = \"https://cran.rstudio.com/\")'") dock$EXPOSE(8000)
dock$CMD("Rscript -e 'servr::httd(\"usr/ran/src/contrib\", host = \"0.0.0.0\", port = 8000)'") dock FROM rocker/r-base RUN mkdir usr/ran/src/contrib/ -p COPY src/contrib usr/ran/src/contrib RUN Rscript -e 'install.packages("httpuv", repos = "https://cran.rstudio.com/")' RUN Rscript -e 'install.packages("jsonlite", repos = "https://cran.rstudio.com/")' RUN Rscript -e 'install.packages("servr", repos = "https://cran.rstudio.com/")' EXPOSE 8000 CMD Rscript -e 'servr::httd("usr/ran/src/contrib", host = "0.0.0.0", port = 8000)'  But there is a thing that’s missing: what if I want to regenerate a RAN everytime I have a new package? Well, let’s write a different Dockerfile to do that. ### A updatable Dockerfile • First of all, I’ll copy all the packages sources in a pkg folder pkg <- list.files("../", pattern = "tar.gz", full.names = TRUE) file.copy(pkg, "pkg") list.files("pkg") ## [1] "attempt_0.2.1.tar.gz" "craneur_0.0.0.9000.tar.gz" ## [3] "devaddins_0.0.0.9000.tar.gz" "dockerfiler_0.1.1.tar.gz" ## [5] "fryingpane_0.0.0.9000.tar.gz" "jekyllthat_0.0.0.9000.tar.gz" ## [7] "languagelayeR_1.2.3.tar.gz" "prenoms_0.1.0.tar.gz" ## [9] "proustr_0.3.0.9000.tar.gz" "rgeoapi_1.2.0.tar.gz" ## [11] "rpinterest_0.4.0.tar.gz" "tidystringdist_0.1.2.tar.gz"  • I’ll then create a craneur.R (file.create("craneur.R")) to automatically launch and write with {craneur} from a folder. It will contain the following code: library(craneur) colin <- Craneur$new("Colin")
lapply(list.files("usr/pkg", pattern = "tar.gz", full.names = TRUE), function(x) colin$add_package(x)) colin$write(path = "usr/ran")

• As I want the user to be able to do http://url only, and as my RAN index is in src/contrib, I’ll create an html that simply does the redirection:
file.create("index.html")


with in it:

• And here is the new Dockerfile:
dock <- Dockerfile$new() # Install the packages dock$RUN("Rscript -e 'install.packages(\"httpuv\", repos = \"https://cran.rstudio.com/\")'")
dock$RUN("Rscript -e 'install.packages(\"jsonlite\", repos = \"https://cran.rstudio.com/\")'") dock$RUN("Rscript -e 'install.packages(\"servr\", repos = \"https://cran.rstudio.com/\")'")
dock$RUN("Rscript -e 'install.packages(\"remotes\", repos = \"https://cran.rstudio.com/\")'") dock$RUN("Rscript -e 'remotes::install_github(\"ColinFay/craneur\")'")
# Create the dir
dock$RUN("mkdir usr/ran -p") dock$RUN("mkdir usr/pkg -p")
# Move some stuffs
dock$COPY("craneur.R", "usr/pkg/craneur.R") dock$COPY("pkg", "usr/pkg")
# Copy the index.html
dock$COPY("index.html", "usr/ran/index.html") # Create the folders dock$RUN("Rscript usr/pkg/craneur.R")
# Open port
dock$EXPOSE(8000) # Launch server dock$CMD("Rscript -e 'servr::httd(\"usr/ran/\", host = \"0.0.0.0\", port = 8000)'")
dock

FROM rocker/r-base
RUN Rscript -e 'install.packages("httpuv", repos = "https://cran.rstudio.com/")'
RUN Rscript -e 'install.packages("jsonlite", repos = "https://cran.rstudio.com/")'
RUN Rscript -e 'install.packages("servr", repos = "https://cran.rstudio.com/")'
RUN Rscript -e 'install.packages("remotes", repos = "https://cran.rstudio.com/")'
RUN Rscript -e 'remotes::install_github("ColinFay/craneur")'
RUN mkdir usr/ran -p
RUN mkdir usr/pkg -p
COPY craneur.R usr/pkg/craneur.R
COPY pkg usr/pkg
COPY index.html usr/ran/index.html
RUN Rscript usr/pkg/craneur.R
EXPOSE 8000
CMD Rscript -e 'servr::httd("usr/ran/", host = "0.0.0.0", port = 8000)'

dock\$write()


So here, if I build it:

docker build -t ran .


And:

docker run -d -p 80:8000 ran


I can go to http://127.0.0.1/ on my browser, and I’ll get the index of all available packages.

I can now try:

install.packages("attempt", repos = "http://127.0.0.1/", type = "source")


And that works as expected

### To the server and beyond

Let’s copy everything on our server in our ran folder:

scp torun.R [email protected]:/usr/ran/
scp craneur.R [email protected]:/usr/ran/
scp Dockerfile [email protected]:/usr/ran/
scp -r pkg/ [email protected]:/usr/ran/
scp index.html [email protected]:/usr/ran/


Let’s go to our virtual machine, and run the Dockerfile with the code we’ve just seen.

docker run -d -p 80:8000 ran


And tadaaa : http://206.189.28.254.

So you can now install from your server:

install.packages("attempt", repos = "http://206.189.28.254", type = "source")


So now, the good thing here is that I can update my package server if ever I remove or add a new tar.gz : I’ll just have to rebuild my Docker image.

## Further work

### Efficient update

Here, to be really efficient, I should split my Docker images in two: one with all the packages, and one with the {craneur} generation : that way, I wouldn’t have to recompile my docker image from scratch everytime I have a modification in the package list.

### DNS

A http://206.189.28.254 is not that nice an adress to share or remember, so we could buy a domain and point it to our server. But… that’s for another day

The post Dockerise and deploy your own R Archive Repo appeared first on The R Task Force.