My last blog post in 2016 on SQL Server 2016….. Some years ago, I have heard predictions from ‘experts‘ that within a few years Hadoop / Spark systems would take over traditional RDBMS’s like SQL Server. I don’t think that has happened (yet). Moreover, what some people don’t realize is that at least half of the world still depends on good old SQL Server. If tomorrow all the Transact stored procedures would somehow magically fail to run anymore, I think our society as we know it would collapse…..
OK, I might be exaggerating a little bit. The point is, there are still a lot of companies and use cases out there that are running SQL Server without the need for something else. And now with the integrated R services in SQL Server 2016 that might not be necessary at all
Deploying Predictive models created in R
From a business standpoint, creating a good predictive model and spending time on this, is only useful if you can deploy such a model in a system where the business can make use of the predictions in their ‘day-to-day operations’. Otherwise creating a predictive model is just an academic exercise / experiment….
Many predictive models are created in R on a ‘stand-alone’ laptop /server. There are different ways to deploy such models. Among others:
- Re-build the scoring logic ‘by hand’ in the operational system. I did this in the past, it can be a little bit cumbersome and it’s not what you really want to do. If you do not have much data prep steps and your model is a logistic regression or a single tree, this is doable
- Make use of PMML scoring. The idea is to create a model (in R) transform that to pmml and import the pmml in the operational system where you need the predictions. Unfortunately, not all models are supported and not all systems support importing (the latest version of) PMML
- Create API’s (automatically) with technology like for example Azure ML, DeployR, sense.io or openCPU, so that the application that needs the prediction can call the API.
SQL Server 2016 R services
If your company is running SQL Server (2016) there is an other nice alternative to deploy R models by using the SQL Server R services. At my work at RTL Nederland [Oh btw we are looking for data engineers and data scientists :-)] we are using this technology to deploy the predictive churn and response models created in R. The process is not difficult; the few steps that are needed are demonstrated below.
Create any model in R
I am using an extreme gradient boosting algorithm to fit a classification model on the titanic data set. Instead of calling xgboost directly I am using the mlr package to train the model. Mlr provides a unified interface to machine learning in R, it takes care of some of the frequently used steps in creating a predictive model regardless of the underlying machine learning algorithm. So your code can become very compact and uniform.
Push the (xgboost) predictive model to SQL Server
Once you are satisfied with the predictive model (on your R laptop), you need to bring that model over to SQL Server so that you can use it there. This consists of the following steps:
SQL Code in SQL Server, write a stored procedure in SQL server that can accept a predictive R model, some meta data and saves that into a table in SQL Server.
This stored procedure can then be called from your R session.
Bring the model from R to SQL, to make it a little bit easier you can write a small helper function.
So what is the result? In SQL Server I now have a table (dbo.R_Models) with predictive models. My xgboost model to predict the survival on the Titanic is now added as an extra row. Such a table becomes like a sort of model store in SQL server.
Apply the predictive model in SQL Server.
Now that we have a model we can use it to calculate model scores on data in SQL Server. With the new R services in SQL Server 2016 there is a function called sp_exec_external_script. In this function you can call R to calculate model scores.
The scores (and the inputs) are stored added in a table.
The code is very generic, instead of xgboost models it works for any model. The scoring can (and should be) be done inside a stored procedure so that scoring can be done at regular intervals or triggered by certain events.
Deploying predictive models (that are created in R) in SQL Server has become easy with the new SQL R services. It does not require new technology or specialized data engineers. If your company is already making use of SQL Server then integrated R services are definitely something to look at if you want to deploy predictive models!
Some more examples with code can be found on the Microsoft GitHub pages.