Handling required and missing R packages in Microsoft R Services
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I have seen several time, that execute R code using procedure sp_execute_external_script was not valid due to missing library or library dependencies.
Problem is – in general – not solved out of the box. But can be solved using and maintaining a list of installed libraries used by Microsoft R services or by simply create a “pre-code” R code to do a check for you.
In both cases, user will end up with additional code, but it might be a good check if you are installing library in production and you run such check prior to running any relevant R code.
Let’s start with simple R code:
USE WideWorldImporters;
GO
EXECUTE sp_execute_external_script
@language = N'R'
,@script=N'library(Hmisc)
df <- data.frame(rcorr(as.matrix(sp_RStats_query),
type="pearson")$P)
OutputDataSet<-df'
,@input_data_1 = N'SELECT
SupplierID
,UnitPackageID
,OuterPackageID
FROM [Warehouse].[StockItems]'
,@input_data_1_name = N'sp_RStats_query'
WITH RESULT SETS ((SupplierID NVARCHAR(200)
,UnitPackageID NVARCHAR(200)
,OuterPackageID NVARCHAR(200)));
This code will in my case return error message, that R Service is missing a specific library in order to execute R code – in my case library Hmisc.

So the first step to solve this issue is to declare variable for R script and parametrize @script parameter for procedure sp_execute_external_script.
DECLARE @OutScript NVARCHAR(MAX)
SET @OutScript =N'library(Hmisc)
df <- data.frame(rcorr(as.matrix(sp_RStats_query),
type="pearson")$P)
OutputDataSet<-df'
EXECUTE sp_execute_external_script
@language = N'R'
,@script= @OutScript
,@input_data_1 = N'SELECT
SupplierID
,UnitPackageID
,OuterPackageID
FROM [Warehouse].[StockItems]'
,@input_data_1_name = N'sp_RStats_query'
WITH RESULT SETS ((
SupplierID NVARCHAR(200)
,UnitPackageID NVARCHAR(200)
,OuterPackageID NVARCHAR(200)
));
Now we need to do a string search for following patterns:
- library (c(package1,package2))
- library (package1), library(package2)
- library(package1)
All these reserved R words denote the need for particular library/libraries installation.
DECLARE @OutScript NVARCHAR(MAX)
SET @OutScript =N'library(Hmisc)
library(test123)
df <- data.frame(rcorr(as.matrix(sp_RStats_query),
type="pearson")$P)
OutputDataSet<-df'
/*
***************************************************************
START: CODE FOR CHECKING AND INSTALLING MISSING R LIBRARIES
***************************************************************
*/
DECLARE @Tally TABLE (num TINYINT,R_Code NVARCHAR(MAX))
INSERT INTO @Tally VALUES (1,@OutScript)
DECLARE @libstatement NVARCHAR(MAX)
DECLARE @cmdstatement NVARCHAR(MAX)
;WITH CTE_R(num,R_Code, libname)
AS
(
SELECT
1 AS num,
RIGHT(R_Code, LEN(R_Code) - CHARINDEX(')', R_Code, 0)) AS R_Code,
substring(R_Code, CHARINDEX('library(', R_Code, 0) + 0, CHARINDEX(')',
R_Code, 0) - CHARINDEX('library(', R_Code, 0) + 1) AS libname
FROM @Tally
WHERE
CHARINDEX('(', R_Code, 0) > 0
AND CHARINDEX('library(',R_Code,0) > 0
UNION ALL
SELECT
1 AS num,
RIGHT(R_Code, LEN(R_Code) - CHARINDEX(')', R_Code, 0)) AS R_Code,
substring(R_Code, CHARINDEX('library(', R_Code, 0) + 0, CHARINDEX(')',
R_Code, 0) - CHARINDEX('library(', R_Code, 0) + 1) AS libname
FROM CTE_R
WHERE
CHARINDEX('(', R_Code, 0) > 0
AND CHARINDEX('library(',R_Code,0) > 0
)
, fin AS
(
SELECT TOP 1 stuff((SELECT ' install.packages(''''' +
REPLACE(REPLACE(REPLACE(c1.libname,'library',''),')',''),'(','') + '''''
, dependencies = T)'
FROM CTE_R AS c1
WHERE
c1.num = c2.num
FOR XML PATH (''),TYPE).value('.','NVARCHAR(MAX)'),1,1,'') AS lib_stat
FROM CTE_R AS c2
)
SELECT
@libstatement = lib_stat
FROM fin
SET @cmdstatement = 'EXEC xp_cmdshell ''"C:\Program Files\Microsoft SQL Server\
MSSQL13.MSSQLSERVER\R_SERVICES\bin\R.EXE" cmd -e ' + @libstatement + ''''
EXEC SP_EXECUTESQL @cmdstatement
/*
***************************************************************
END: CODE FOR CHECKING AND INSTALLING MISSING R LIBRARIES
***************************************************************
*/
EXECUTE sp_execute_external_script
@language = N'R'
,@script= @OutScript
,@input_data_1 = N'SELECT
SupplierID
,UnitPackageID
,OuterPackageID
FROM [Warehouse].[StockItems]'
,@input_data_1_name = N'sp_RStats_query'
WITH RESULT SETS ((
SupplierID NVARCHAR(200)
,UnitPackageID NVARCHAR(200)
,OuterPackageID NVARCHAR(200)
));
Result in this case will be successful with correct R results and sp_execute_external_script will not return error for missing libraries.
I added a “fake” library called test123 for testing purposes if all the libraries will be installed successfully.
At the end the script generated xp_cmdshell command (in one line):
EXEC xp_cmdshell '"C:\Program Files\Microsoft SQL Server\
MSSQL13.MSSQLSERVER\R_SERVICES\bin\R.EXE" cmd
-e install.packages(''Hmisc'') install.packages(''test123'')'
You might also experience the following error in the output of xp_cmdshell command:

In this case, go to the following location and enable write permission for this folder and subfolders.

For the end, you will always find more elegant and easy way to write R code that will inadvertently check the installation (as well as version and dependencies for library) with following R code:
if(!is.element("Hmisc", installed.packages()))
{install.packages("Hmisc", dependencies = T)
}else{library("Hmisc")}
So the original code can simply be changed to:
USE WideWorldImporters;
GO
EXECUTE sp_execute_external_script
@language = N'R'
,@script=N'if(!is.element("Hmisc", installed.packages()))
{install.packages("Hmisc", dependencies = T)
}else{library("Hmisc")}
df <- data.frame(rcorr(as.matrix(sp_RStats_query),
type="pearson")$P)
OutputDataSet<-df'
,@input_data_1 = N'SELECT
SupplierID
,UnitPackageID
,OuterPackageID
FROM [Warehouse].[StockItems]'
,@input_data_1_name = N'sp_RStats_query'
WITH RESULT SETS ((SupplierID NVARCHAR(200)
,UnitPackageID NVARCHAR(200)
,OuterPackageID NVARCHAR(200)));
But unfortunately, one can not always count on the consistence of developer or data scientists or author of the code, that they will always add a simple check for library installation.
Code is available at GitHub.
Happy R-TSQLing!
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.