R 3.3.0 is another motivation for Docker

[This article was first published on r-addict.com/, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Have you ever encountered R packages versioning issues when one application required different dependent packages versions than other? Have you ever got stuck with your project because of wrong pre-installed software versions on machine on which you should run your code? Or maybe you had heavy adventures with installing R software on a new machine because you couldn’t recall all installation steps like; what have I done 2 years ago that RCurl works on my local machine but I can’t install it now on my virtual machine with Windows? Or maybe installation of your R project on new machine was easy but admin couldn’t manage with this process, as he’s not regular R user? If you ever find it problematic to move your R applications to other machines, then this Docker guid post is for you!

Docker Motivation

Recently R world have been announced with the new R 3.3.0 version (which was a great occasion to release many R projects like RTCGA). It is great that R open source project continues to grow, but switching to new software versions should be done carefully, especially when working in a team of R developers.

It is almost a daily routine that each person developing an R pet project has various libraries and R itself versions, which can be checked with devtools::session_info(). This causes situations in which application work on one machine but not on the other (or works partially), and mistakenly it might seem that having the newest R and its libraries versions is the correct way of cooperation (NOT!).

In most cases it is not complicated to configure one machine to support one R application, but what if you move your R project to a machine that have configured Shiny Server that should handle dozens of R applications, developed by people that might not even know each others? Our team have been stacked for a week after we’ve tried to publish our shiny application (that I wrote about it here) due to server configurations for already working applications, but not for a one developed under new R 3.3.0.

The server (example: http://mi2.mini.pw.edu.pl:8080/RTCGA/ maintains applications created by students and graduates of Faculty of Mathematics and Information Science, Warsaw University of Technology and works under R 3.2.4 and some version of shiny package. We’ve found it impossible to install the newest version of shiny package under the user-specified library path (not to affect other applications) as now it requires R 3.3.0, so the application broke completely. I could install R from scratch in the user-specified path and configure all packages and related software but I thought: what if we move our application in a month to another, better server? Would I have to proceed with the installation and configuration once more? Can’t I do it once and have black box application that would work always and anywhere not affecting other processes?

Yes, I can. I used Docker.

Docker short intro

Few sentences from Docker site might help understand Whas is Docker?

Docker allows you to package an application with all of its dependencies into a standardized unit for software development.

Docker containers wrap up a piece of software in a complete filesystem that contains everything it needs to run: code, runtime, system tools, system libraries – anything you can install on a server. This guarantees that it will always run the same, regardless of the environment it is running in.

When your app is in Docker containers, you don’t have to worry about setting up and maintaining different environments or different tooling for each language. Focus on creating new features, fixing issues and shipping software.

In a few words Docker helps creating black box applications that are pre-configured and standalone and can work regardless of the outside software.

rocker

Dirk Eddelbuettel have prepared repository on hub.docker.com called rocker which consists of docker containers having r-base, RStudio or Shiny Server (Introducing Rocker: Docker for R). Every other Docker with R application can be build from other, more basic Docker, so you can develop your applications starting from r-base Docker image.

After you install Docker you can use open source shared containers with pre-configured software like Hadoop, Spark or Shiny Server. Below is the example on how to easily run Shiny Server on port 3838

sudo docker run --rm -p 3838:3838 rocker/shiny

which will by default look for containers in hub.docker.com, where rocker stands for repository name and shiny stands for container name.

You can check running Docker images with

sudo docker ps

and check which Dockers images have you already downloaded or build locally

sudo docker images

Sometimes you would like to enable Docker running image to communicate with the outer folder. It can be done with the -v flag like that

docker run --rm -p 3838:3838 
    -v /srv/shinyapps/:/srv/shiny-server/ 
    -v /srv/shinylog/:/var/log/ 
    rocker/shiny

where the first part of /srv/shinyapps/:/srv/shiny-server/ is the outer path on machine and the part after : is the path inside docker which will be merged with the outher path.

This is how we run our pet R project shiny dahsboard application http://mi2.mini.pw.edu.pl:3838/CzasDojazdu/App/ that enables to find rooms in Warsaw that are available to rent, with restrictions to the time distance from localization specified by a user.

Creating your first Docker

Each Docker needs its configuration file named Dockerfile where you specify from which basic Docker image you would like to build your Docker container. Here is a link to a shiny-extra Docker configuration file which is also included below

FROM rocker/shiny:latest 

MAINTAINER Marcin Kosiński "[email protected]"

# install additional packages
RUN R -e "install.packages(c('shinydashboard', 'leaflet', 'dplyr', 'ggmap', 'stringi', 'RSQLite', 'DT'), repos='https://cran.rstudio.com/')"

CMD ["/usr/bin/shiny-server.sh"]

where FROM tells from which image you will build your Docker, RUN tells what command to run during your Docker container is being build and CMD tells which script to run when the Docker is run. In that case it is a script that runs a Shiny Server with additional R packages c('shinydashboard', 'leaflet', 'dplyr', 'ggmap', 'stringi', 'RSQLite', 'DT').

I hope this post have convinced you to consider using Dockers based on rocker in your applications. It is standalone, easy to move to other machines and there already exist many containers with advanced pre-configured software.

To leave a comment for the author, please follow the link and comment on their blog: r-addict.com/.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)