Data Science Workbench for Ubuntu 14.04

[This article was first published on R Tricks – Data Science Riot!, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I found myself installing the same things over and over again on my VMs, so I decided to pack all my good DSR workbench action into one giant shell script that I could run and walk away from.

Below is my markdown file, you can grab the shell scripts at my GitHub page. The script takes about 30 min. to finish. It’s tested on Ubuntu Server, but I whipped up a desktop version too since some folks tend to like GUIs on their servers–although I’m not sure why.

Begin markdown

############

This is a shell script that spins up several popular data science-y server environments on one box. This script is tested and verified on Ubuntu 14.04.

This environment is built for a fresh install of Ubuntu 14.04. It will update/upgrade all base packages and install all needed dependencies. Software packages installed include:

Installation

Download or copy the shell script and run it on your Ubuntu box under your home user:

$ ./data_science_workbench.sh

WARNING: Don’t execute as root, it needs your home directory for a couple things.

Server vs. Desktop

The two scripts are virtually identical. The “Desktop” version installs a few more GUI goodies that aren’t necessary for a headless setup. Additions include:

  • Rstudio IDE
  • PgAmnin3: Postgres GUI
  • Anaconda launcher: Install and launch Anaconda development tools.

Note, the desktop version is quite a bit larger. If you do this on a VM make sure your virtual drive is at least 20 gigs!

Post-installation

Server users: will be dumped into a Tmux shell since the Anaconda environment needs a new shell session to take affect.

Desktop users: will have to close and re-open a new terminal window.

Desktop users: Anaconda launcher can be invoked by typing launcher in the terminal.

All users: Jupytherhub isn’t running by default. It can be invoked by typing jupytherhub in the terminal.

Configuration

After the script finishes, you’ll have a few things running on their default ports.

  • RStudio: On localhost:8787 (Username and password set in the script)
  • Shiny-Server: On localhost:3838 (No username or password)
  • Jupyterhub not started by default but you can fire it up with the command jupyterhub. On localhost:8000 (Username and password are the same as those of the Ubuntu user that ran the script.)

Changes in Default Behavior

A few default locations and files have been altered to allow universal access by all users on the system.

  • R: The Renviron file has been altered to create a unified package library that is readable by all users and Shiny-Server. The file is located at usr/lib/R/etc/Renviron.
  • Anaconda: Normally stored in a user’s home directory, Anaconda is installed in /opt/anaconda3.
  • Anaconda: A path has been added to your user’s .bashrc file to make Anaconda your default for Python and Pip. The file is located at home/YOUR_USERNAME. The added path is at the bottom of the file and is export PATH="/opt/anaconda3/bin:$PATH". This line must be added manually for each new user on the system. The path can be added by going to the new user’s home directory and running: echo 'export PATH="/opt/anaconda3/bin:$PATH"' >> ~/.bashrc

SSL Authentication?

By default Jupyterhub use basic authentication, although SSL is available. To set up SSL, see the Jupytherhub documentation.

Shiny and RStudio Server are differant stories. Shiny open-souce edition ships with no authentication whatsoever, and RStudio Server only wiht basic auth. Both offer “Enterprise” editions, which offer SSL.

The simple solution would be to use an Apache2 reverse proxy to add SSL to both, but I have a feeling that may violate the terms of service. I’m too lazy to read the entire TOS, so I’ll just recomend you don’t do it. They are both licensed under AGPL v3 if anyone is interested.

Other Data Science Boxes

Data Science Box is based on Ubuntu 12.04 and uses IPython Notebook instead of Jupyter.

Data Science Toolbox is available as either a Virtualbox image or on AWS. Again, it uses older versions of IPython Notebook but it looks as if it’s under active development.


 

To leave a comment for the author, please follow the link and comment on their blog: R Tricks – Data Science Riot!.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)