R and Docker

[This article was first published on Revolutions, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

If you regularly have to deal with specific versions of R, or different package combinations, or getting R set up to work with other databases or applications then, well, it can be a pain. You could dedicate a special machine for each configuration you need, I guess, but that's expensive and impractical. You could set up virtual machines in the cloud which works well for one-off situations, but gets tedious having to re-configure a new VM each time. Or, you could use Docker containers, which were expressly designed to make it quick easy to configure and launch an independent and secure collection of software and services.

If you're new to the concept of Docker containers, here's a docker tutorial for data scientists. But the concepts are pretty simple. At Docker hub, you can search “images” — basically, bundles of software with pre-configured settings — contributed by the community and by vendors. (You'll be referring to the images by name, for example: rocker/r-base.) You can then create a “container” (a running instance of that image) on your machine with the docker application, or in the cloud using the tools offered by your provider of choice.

For R users, there's a wide array of pre-configured Docker images for R available since 2014, thanks to the Rocker project. You can browse the rocker repository at Docker Hub to see everything available, but it includes:

  • Simple images containing just the latest official R release or the latest daily R build.
  • Images containing both R and RStudio Server.
  • Images with the tidyverse suite of packages pre-installed.
  • Version-stable images, snapshotted to specific R (and RStudio versions) and the R package ecosystem at specific points in time. If you retrieve one of these images using a tag, your docker image will always include the same software, even months or years down the line. These are perfect for production instances, where reproducibility is paramount.

I find the images containing RStudio Server super convenient whenever I need to try out something in a specific R version. All I need to do is provide the image name to Azure Container Instances, and make sure port 8787 is open:

Azure Container Instances
Creating a container for R 3.4.1 with the tidyverse packages and RStudio Server.

 

Azure Container Instances
Be sure to open port 8787 here for browser access.

That's it for the configuration, and after the instance is ready (about 2 minutes) I can use a web browser to visit http://40.121.205.121:8787/ to find a completely fresh R instance and the RStudio IDE. (The actual IP address will be provided for you by Container Instances, and can be found in the Overview section for your instance in the Azure Portal.)

You can of course use other cloud providers as well: Andre Heiss provides this guide for setting up a rocker image in Digital Ocean, and also provides some handy tips for creating your own Docker Files to create custom images of your own design. For more on the Rocker project, follow the link below.

The Rocker Project: Docker Containers for the R Environment

To leave a comment for the author, please follow the link and comment on their blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)