Saving input and output with sp_execute_external_script using temporal table and file table (part #2)

[This article was first published on R – TomazTsql, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In my previous blog post, Saving input and output with sp_execute_external_script, I was exploring the possibilities how to capture the R code that external procedure sends it internally to the Launchpad.exe program.  And a blog comment by reader Bob gave me additional push to write the second part to same topic. Thank you Bob for sharing this with us.

Bob was explained that how they are doing, and you can read all about it here. To recap, R code (as well as any additional packages) is stored on local file system, it gets copied, zipped, transferred and inserted to SQL table. Calling sp_execute_external_script everything gets unzipped and executed.

I will not comment on the solution Bob provided, since I don’t know how their infrastructure, roles, security is set up. At this point, I am grateful for his comment. But what I will comment, is that there is no straightforward way or any out-of-the-box solution. Furthermore, if your R code requires any additional packages, storing the packages with your R code is not that bad idea, regardless of traffic or disk overhead. And versioning the R code is something that is for sure needed.

To continue from previous post, getting or capturing R code, once it gets to Launchpad, is tricky. So storing R code it in a database table or on file system seems a better idea.

Starting with original sample R code:

EXEC sys.sp_execute_external_script
     @language = N'R'
    ,@script = N'
       d <- InputDataSet 
       c <- data.frame(Num_V1 = c(1,2,3))
       c
       OutputDataSet <- c'
    ,@input_data_1 = N'SELECT 1 AS Nmbrs_From_R'

WITH RESULT SETS ((Numbers_From_R INT));

We can create SQL table for R code to be persistent and always available.

CREATE TABLE R_code 
(id INT
,R NVARCHAR(MAX))
INSERT INTO R_code
SELECT 1, '
       d <- InputDataSet 
       c <- data.frame(Num_V1 = c(1,2,3))
       c
       OutputDataSet <- c'
-- (1 row(s) affected)

DECLARE @r_code NVARCHAR(MAX)
SELECT @r_code = R FROM R_code WHERE id = 1

EXEC sys.sp_execute_external_script
     @language = N'R'
    ,@script = @r_code
    ,@input_data_1 = N'SELECT 1 AS Nmbrs_From_R'
WITH RESULT SETS ((Numbers_From_R INT));

This will yield same results. To have R code and T-SQL code side by side, I would suggest to store T-SQL in table as well.

DROP TABLE IF EXISTS R_code
CREATE TABLE R_code 
(id INT
,R NVARCHAR(MAX)
,SQLC NVARCHAR(MAX))

INSERT INTO R_code
SELECT 1, '
       d <- InputDataSet 
       c <- data.frame(Num_V1 = c(1,2,3))
       c
       OutputDataSet <- c','SELECT 1 AS Nmbrs_From_R'
-- (1 row(s) affected)

DECLARE @r_code NVARCHAR(MAX)
DECLARE @sql_code NVARCHAR(MAX)
SELECT @r_code = R FROM R_code WHERE id = 1
SELECT @sql_code = SQLC FROM R_code WHERE id = 1

EXEC sys.sp_execute_external_script
     @language = N'R'
    ,@script = @r_code
    ,@input_data_1 = @sql_code
WITH RESULT SETS ((Numbers_From_R INT));

Now, the best thing to do, is to add some logging to the R_code table and some versioning. Easiest way to achieve this by using Temporal Table.

TEMPORAL TABLE

Rewrite the original table:

CREATE TABLE R_code 
(
 id INT IDENTITY(1,1)
,CombinationID INT NOT NULL CONSTRAINT PK_ComboID PRIMARY KEY
,R NVARCHAR(MAX)
,SQLC NVARCHAR(MAX)
,Valid_From DATETIME2 GENERATED ALWAYS AS ROW START NOT NULL
,Valid_To DATETIME2 GENERATED ALWAYS AS ROW END NOT NULL
,PERIOD FOR SYSTEM_TIME (Valid_From, Valid_To)
)
WITH (SYSTEM_VERSIONING = ON);

Please note, that table will be represented slightly differently (see the clock in the icon).

temp_table

Besides actual table (or Temporal Table), system automatically creates history table where all the changes are being kept.

Once this is done, I can store T-SQL and R-code.

INSERT INTO R_code (CombinationID, R, SQLC)
SELECT 1,'
       d <- InputDataSet 
       c <- data.frame(Num_V1 = c(1,2,3))
       c
       OutputDataSet <- c','SELECT 1 AS Nmbrs_From_R'
-- (1 row(s) affected)

Now I can run the query same way as before:

DECLARE @r_code NVARCHAR(MAX)
DECLARE @sql_code NVARCHAR(MAX)
SELECT @r_code = R FROM R_code WHERE CombinationID = 1
SELECT @sql_code = SQLC FROM R_code WHERE CombinationID = 1

EXEC sys.sp_execute_external_script
     @language = N'R'
    ,@script = @r_code
    ,@input_data_1 = @sql_code
WITH RESULT SETS ((Numbers_From_R INT));

Suppose that there are changes either to R code or SQL Code committed, and I will simulate this change with an UPDATE query:

-- INSERT ANOTHER CombinationID = 1, with changed R Code
UPDATE R_code
SET R = '
       d <- InputDataSet 
       c <- data.frame(Num_V1 = c(1,2,3,4))
       c
       d
       OutputDataSet <- c'
,SQLC = 'SELECT 1 AS Nmbrs_From_R'
WHERE
    CombinationID = 1
-- (1 row(s) affected)

The best part is that system maintains the versioning automatically and I – as an end user – don’t need to worry about changing the original T-SQL code that executes R Script. So once again I can execute the same query:

DECLARE @r_code NVARCHAR(MAX)
DECLARE @sql_code NVARCHAR(MAX)
SELECT @r_code = R FROM R_code WHERE CombinationID = 1
SELECT @sql_code = SQLC FROM R_code WHERE CombinationID = 1

EXEC sys.sp_execute_external_script
     @language = N'R'
    ,@script = @r_code
    ,@input_data_1 = @sql_code
WITH RESULT SETS ((Numbers_From_R INT));

And now, I will get the results from updated R-script or T-SQL code.

result_temporal

FILE TABLE

Another way (among many) is to use File Table. In one of my previous blog posts, I covered how to create and configure File Table. In this scenario, we will consider following. Create a file, that will contain R code and store it with *.R extension. And upload it to the directory, where File Table is considering this file-stream data to be landed or stored.

Quickly check the configuration:

--- Check configurations
SELECT 
  DB_NAME(database_id) AS DbName
 ,non_transacted_access
 ,non_transacted_access_desc
 ,directory_name  
 ,*
FROM  sys.database_filestream_options
WHERE 
    DB_NAME(database_id) = db_name() --'FileTableRChart'

By checking where my R file is residing:

SELECT 
     FT.Name AS [File Name]
    ,IIF(FT.is_directory=1,'Directory','Files') AS [File Category]
    ,FT.file_type AS [File Type]
    ,(FT.cached_file_size)/1024.0 AS [File Size (KB)]
    ,FT.creation_time AS [File Created Time]
    ,FT.file_stream.GetFileNamespacePath(1,0) AS [File Path]
    ,ISNULL(PT.file_stream.GetFileNamespacePath(1,0),'Root Directory') AS [Parent Path]
FROM 
    [dbo].[ChartsR] AS FT
LEFT JOIN [dbo].[ChartsR] AS PT
ON FT.path_locator.GetAncestor(1) = PT.path_locator
WHERE
    FT.File_type = 'R'

And you can see that I have created R file with the name R_combination1.R

2017-04-17 19_46_14-Document1 - Word

So we can access this using OPENROWSET.

SELECT * FROM 
OPENROWSET(BULK N'\\****\RCharts\DocumentTable\R_Combination1.R',
   SINGLE_CLOB) AS R_Code

You can also do this using the master.dbo.xp_cmdshell, whatever suits you better.

Once you have code read from R file, you can simply continue to execute external procedure.

-- Physical Location of FileTable
DECLARE @r_code NVARCHAR(MAX)
SELECT @r_code = BulkColumn FROM 
OPENROWSET(BULK N'C:\DataTK\00\R_Combination1.R', SINGLE_CLOB) AS R_Code

EXEC sys.sp_execute_external_script
     @language = N'R'
    ,@script = @r_code
    ,@input_data_1 = N'SELECT 1 AS Nmbrs_From_R'
WITH RESULT SETS ((Numbers_From_R INT));

And result is the same. In this case you need to consider the usage of BCP, BulkCopy or XP_CMDSHELL. Again, based on your eco-system, what suits you best.

With file table versioning can be a bit of work-around. In addition, you should implement some naming convention to store files correctly or add additional logic to storing and keeping the changes.2017-04-17 20_31_20-Store_input_output_sp_execute_external_script.sql - SICN-KASTRUN.FileTableRChart

So in my case, I am keeping versioning on FileName level, which can be done using T-SQL or renaming the file on file system and later creating hierarchies with files.

 

As always, Code is available at GitHub.

 

Happy coding!


To leave a comment for the author, please follow the link and comment on their blog: R – TomazTsql.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)