pocker: A docker container to integrate R and Python in CI/CD frameworks

[This article was first published on R on Lino GALIANA, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Genesis

I started to use continuous integration with gitlab a few weeks ago and up to a few days was really happy with rocker image (basically docker + R).

I became ambitious and started to write a markdown that was comparing R and Python speed on simple operations. It was working fine on my laptop (anaconda is installed). However, because anaconda is not available in rocker image, markdown compilation naturally failed. I thus started the project to create a docker image that would do the job, i.e. that would integrate Python and R together. The container I propose is not well-suited for Python only repository, its goal is to ease the pass-through between Python and R

Since I am a beginner in docker ecosystem, it has not been an easy path. When I was thinking the solution would be trivial to implement I was planning to make the repository private. However, I think now that the solution produced can help people. I decided to make it public. To make the project as reproducible as possible, I ended up with that complex workflow:

  • github connected to dockerhub to build image base from DockerFile
  • gitlab with continuous integration using /gitlabCI/.simple_configuration.yml example file as a reproducible workflow
  • dockerhub that builds automatically from github repository the docker image

Complex workflow, simple image

This is not the most natural workflow. If you go into project history, you might see that I did not adopt initially that workflow. I adopted it after merging branches from two separated project that were pursuing the same goal. This complex set up presents an advantage for reproducibility: each time project updates are pushed, the code used to build pocker image and the example of use from continuous integration is updated.

I should warn people used to create docker image that I might not have created the most parsimonious image necessary to run R and Python together. I would welcome pull request to improve pocker repository

Some explanations

DockerFile is used to build the image. The main steps are the following:

  • Start from rocker/verse container that avoids re-installing tidyverse each time a CI/CD job is ran.
  • Install python 3 and anaconda
  • Add conda binary directory in path
  • Install reticulate package

In gitlabCI directory, you will find scripts useful for continuous integration related to docker project:

  • complete_configuration.yml: the gitlab CI/CD configuration file I was using before building my own docker image. It starts from rocker/verse and follows the same steps that the Dockerfile that has been presented
  • simple_configuration.yml: gitlab CI/CD configuration I use now that pocker container is built

The other scripts build.R, scripts/* are here to propose tests for the configuration obtained from gitlab CI/CD.

To leave a comment for the author, please follow the link and comment on their blog: R on Lino GALIANA.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)