Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

With RTM version of SQL Server 2016, sp_execute_external_script stored procedure has undergone couple of changes prior to it’s final outlook. The parametrization of this external procedure somehow resembles a typical extended stored procedure.

Indeed, sp_execute_external_script is an extended stored procedure written using CLR (whereas stored procedures are natively written in T-SQL) and their main purpose it that they run external commands that a normal T-SQL stored procedure could not handle.

Those who are (have been) working with any kind of external stored procedure or stored procedure using

AS { EXTERNAL NAME assembly_name.class_name.method_name }

you will be familiar with the sp_execute_external_script notation.

  EXECUTE sys.sp_execute_external_script
@language =
,@script =
,@input_data_1 =
,@input_data_1_name =
,@output_data_1_name =
,@parallel =
,@params =
,@parameter1 =

Parameters @params and @parameter1 are interesting, but what might be a bit puzzling are numbers at the end of the names of @input_data_1, @input_data_1_name,… They have  no technical meaning (as far as I have found out) since they don’t enumerate anything and if you by common sense create @input_data_2 parameter, you will get an error in return. In a way this error would have been expected, since joining two SQL Statements into one R dataset would just be nonsense. It is more likely that numbers just denote data columns or data parameters can be enumerated within string value of a particular input parameter and that you need at least one of the items if you are using this parameter.

So parameters with enumerator number in the names, these parameters can hold more values and both parameters @params and @parameter1 are paired:

@params is  list of input parameter declarations  and

@parameter1 is list of values for the input parameters

just like for @input_data_1 and @input_data_1_name parameters.

Simple example would be getting Chi-Square value and statistical significance in one run out of R:

USE WideWorldImporters;
GO

DECLARE @F_Value VARCHAR(1000)
DECLARE @Signif VARCHAR(1000)

EXECUTE sys.sp_execute_external_script
@language = N'R'
,@script = N'mytable <- table(WWI_OrdersPerCustomer$CustomerID, WWI_OrdersPerCustomer$Nof_Orders)
data.frame(margin.table(mytable, 2))
Ch <- unlist(chisq.test(mytable))
F_Val <- as.character(Ch[1])
Sig <- as.character(Ch[3])'
,@input_data_1 = N'select TOP 10 CustomerID, count(*) as Nof_Orders
from [Sales].[Orders] GROUP BY CustomerID'
,@input_data_1_name = N'WWI_OrdersPerCustomer'
,@params = N' @F_Val VARCHAR(1000) OUTPUT, @Sig VARCHAR(1000) OUTPUT'
,@F_Val = @F_Value OUTPUT
,@Sig = @Signif OUTPUT

SELECT
@F_Value AS CHI_Value
,@Signif AS CHI_Square_SIGNIFICANCE;
GO

With @param and @parameter1 I was able to get two separate values from a list of a statistical test (against some sample data) in one run. Of course, the result of unlist function can be added to data.frame and easier parsed but what if I wanted to have data displayed as a frequencies and also test of statistical significance, I can simply do:

USE WideWorldImporters;
GO

DECLARE @F_Value VARCHAR(1000)
DECLARE @Signif VARCHAR(1000)

EXECUTE sys.sp_execute_external_script
@language = N'R'
,@script = N'mytable <- table(WWI_OrdersPerCustomer$CustomerID, WWI_OrdersPerCustomer$Nof_Orders)
data.frame(margin.table(mytable, 2))
Ch <- unlist(chisq.test(mytable))
F_Val <- as.character(Ch[1])
Sig <- as.character(Ch[3])
OutputDataSet<-data.frame(margin.table(mytable, 2))'
,@input_data_1 = N'select TOP 10 CustomerID, count(*) as Nof_Orders
from [Sales].[Orders] GROUP BY CustomerID'
,@input_data_1_name = N'WWI_OrdersPerCustomer'
,@params = N' @F_Val VARCHAR(1000) OUTPUT, @Sig VARCHAR(1000) OUTPUT'
,@F_Val = @F_Value OUTPUT
,@Sig = @Signif OUTPUT
WITH RESULT SETS(
(Cust_data INT
,Freq INT)
)

SELECT @F_Value AS CHI_Value
,@Signif AS CHI_Square_SIGNIFICANCE

As you can see, there is result set clauses added and R script has 3 outputs defined; 1 for the data.frame output and 2 variables through parameters for statistical significance; as shown on print-screen:

Such export of the results is always very useful. In Reporting Services, in Power BI or simply in SSMS when running the resulsts.

Code available at Github.

Happy R-SQLing!