by Le Zhang (Data Scientist, Microsoft) and Graham Williams (Director of Data Science, Microsoft)
As an in-memory application, R is sometimes thought to be constrained in performance or scalability for enterprise-grade applications. But by deploying R in a high-performance cloud environment, and by leveraging the scale of parallel architectures and dedicated big-data technologies, you can build applications using R that provide the necessary computational efficiency, scale, and cost-effectiveness.
We identify four application areas and associated applications and Azure services that you can use to deploy R in enterprise applications. They cover the tasks required to prototype, build, and operationalize an enterprise-level data science and AI solution. In each of the four, there are R packages and tools specifically for accelerating the development of desirable analytics.
Below is a brief introduction of each.
Cloud resource management and operation
Cloud computing instances or services can be harnessed within an R session, and this favors programmatic control and operationalization of R based analytical pipelines. R packages and tools in this category are featured by offering a simplified way to interact with the Azure cloud platform and operate resources (e.g., blob storage, Data Science Virtual Machine, Azure Batch Service, etc.) on Azure for various tasks.
- AzureSMR – R package for managing a selection of Azure resources. Targeted at data scientists who need to control Azure resources directly from R functions. APIs include Storage Blobs, HDInsight (Nodes, Hive, Spark), Resource Manager, and Virtual Machines.
- AzureDSVM – R package that offers a convenient harness for the Azure Data Science Virtual Machine (DSVM), remote execution of scalable and elastic data science work, and monitoring of on-demand resource consumption.
- doAzureParallel – R package that allows users to submit parallel workloads in Azure.
- rAzureBatch – run R code in parallel across a cluster in Azure Batch.
- AzureML– an R interface to AzureML experiments, datasets, and web services.
Remote interaction and access to cloud resources
Data scientists can seamlessly log in and out of R session on cloud for experimentation and explorative study. The R packages and tools in this category help data scientists or developers to remotely access or interact with Azure cloud instances or services for convenient development.
- mrsdeploy – an R package that provides functions for establishing a remote session in a console application and for publishing and managing a web service that is backed by the R code block or script you provided.
- R Tools for Visual Studio – IDE with R support.
- RStudio Server – IDE for remote R session with access via Internet browser.
- JupterHub – Jupyter notebook with multi-user access.
- IRKernel – R kernel for Jupyter notebook.
Scalable and advanced analytics.
Scalable analytics and advanced machine (deep) learning model creation can be performed in R on cloud services, with acceleration of application-specific hardware like GPUs. R packages and tools in this category allow one to perform large-scale R-based analytics on Azure with modern frameworks such as Spark, Hadoop, Microsoft Cognitive Toolkit, Tensorflow, and Keras. It is worth mentioning that many of the tools are pre-installed and configured for direct use on the Azure Data Science Virtual Machine.
- dplyrXdf – a dplyr backend for the XDF data format used in Microsoft ML Server.
- sparklyr – R interface for Apache Spark.
- SparkR – an R package that provides a light-weight frontend to use Apache Spark from R.
- CNTK-R – R bindings to the Cognitive Toolkit (CNTK) deep learning library.
- tensorflow – R interface to Tensorflow.
- mxnet – R interface to MXNET, bringing flexible and efficient GPU computing and state-of-art deep learning to R.
- keras – R interface to Keras.
- darch – Create deep architectures in R.
- deepnet – Implement some deep learning architectures and neural network algorithms, including BP, RBM, DBN, Deep autoencoder and so on.
- gpuR – R interface to use GPUs.
- RevoScaleR – a collection of portable, scalable, and distributable R functions for importing, transforming, and analyzing data at scale, included with Microsoft ML Server.
- MicrosoftML – a package that provides state-of-the-art fast, scalable machine learning algorithms and transforms for R.
- h2o – R interface to H2O.
Application and service deployment
R based applications can be easily deployed as service for end-users or developers. The R packages and tools in this category are used for deploying an R-based analytics or applicaiton as services or interfaces that can be conveniently consumed by end-users or developers.
- mrsdeploy – an R package included with Microsoft ML Server that provides functions for deploying easily-consumable service within R session.
- AzureML– an R package to allow one to interact with Azure Machine Learning Studio for publishing R functions as API services.
- Azure Container Instances – service to allow running containerized R analytics in Azure.
- Azure Container Service – service that simplifies deployment, management, and operation of orchestrated containers of R analytics in Azure.
- Shiny server – Develop and publish Shiny based web applications online.
For more information
Companies around the world are using R to build enterprise-grade applications on Azure. For in-depth examples (with code and architecture), you can also find a selection of R based solutions for real-world use cases. A more detailed list of packages and tools for deploying R in Azure is provided at the link below, and will be updated as new tools become available.
Github (yueguoguo): R in Azure