by Andrie de Vries
One of the most exciting features of the forthcoming release of SQL Server 2016 (generally available on June 1, and in preview now) is the ability to run R scripts directly in database using SQL Server R Services. This means your R code will run directly on the server, as opposed to first extracting the data to your local R session. This has several advantages, for example the data never moves from the server — this offers improved data security and also prevents the latency of downloading data.
This is great news for R users, of course. However, it means the data base administrator (DBA) has to learn a new set of tools to manage an R environment on the SQL Server instance.
One administration task is to ensure that SQL Server instance has all the correct versions of R packages installed.
Installing packages on a SQL Server machine
If the SQL Server machine has a connection to the Internet, then this is a trivial task: run install.packages() from an R session. However, many production instances will be on a machine behind a fire wall, with no Internet connectivity at all. In this case, install.packages() will not find the packages on your favourite CRAN mirror.
As an experienced R user, you will know that you can install R packages from a local package zip file. This is correct, and also easy to do. The problem is how to ensure that you have resolved all of the dependencies of the package you need, and to install from local zip files in the correct sequence.
Fortunately there is an easy way to install many packages in a simple operation. The solution is to use the miniCRAN package to create a local repository of the packages you need, then to install from this repository.
You can find an introduction to miniCRAN in the blog posts I wrote in the past, for example:
- Introducing miniCRAN: an R package to create a private CRAN repository
- Dependencies of popular R packages
- Using miniCRAN in Azure ML
In summary, miniCRAN enables organisations to maintain a private mirror of CRAN with only a subset of packages that are relevant to them. miniCRAN makes this possible by determining the dependency tree for a given set of packages, then downloading all of the package dependencies.
Recipe for installing packages
To be clear, installing your package on a SQL Server instance is a three step process:
- On a machine with Internet access, create a miniCRAN repository. A miniCRAN repository is a folder structure that contains all the required packages, as well as their dependencies.
- Copy the miniCRAN respository to the SQL Server instance. To do this, you will have to use standard system administration tools, for example use FTP or SSH to copy the files to the target machine. The result is that the miniCRAN repository resides on the SQL Server instance.
- On the target SQL Server machine, use install.packages(), making sure you point to the local miniCRAN repository.
This is all comparatively simple, and the trickiest bit is to ensure you specify all the correct R version settings to a script.
The code below contains the complete sequence for managing this process for SQL Server 2016 as the target machine.