How to run R in the cloud (for teaching)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Last week, we launched the early stage beta version of our interactive online learning platform for R: DataMind.org. The development of this educational platform required the creation of a new IT infrastructure able to run R in the cloud. In this post, we share our approach and insights on the design of such an application and hope it can provide an inspiration for the development of other web- based interfaces to R. Just for the record, the architecture we developed for DataMind.org is intended for relatively complex operations. If you want to run R in the cloud for more simple use cases, we recommend you to have a look at the powerful shiny package or the opencpu project (or for integration in business processes, at the R Service Bus). These approaches are most useful for those interested in building web apps on top of R for specific use cases. If you want to do data analysis yourself while running R in the cloud with e.g. RStudio server, have a look at this great post from Tal Galili.
The DataMind platform consists of two parts: a front-end application, which our users see on their pc, laptop or tablet and that runs in their browser; and a collection of backend applications that handle all the interaction between users and the platform (see Figure below).
The front-end part uses the AngularJS javascript framework of Google, that helps us to emulate the look and feel of a native application in the browser. This way the end user continuously works in a familiar and smooth interface, which serves as a great didactical asset for an education platform. Delivery is done through Amazon CloudFront, ensuring fast delivery anytime and anywhere.
The back-end application falls into two parts that are both hosted in the cloud: a cluster of R servers and the DataMind web application. The former is hosted on Amazon EC2, a very popular pay-as-you-go virtualization platform allowing easy and cheap scaling of the R computation capacity. The DataMind web application itself is written in Ruby on Rails, an agile programming framework that runs on the Ruby programming language, and hosted on the flexible and well-known cloud application provider Heroku. The Ruby on Rails application handles user accounts, the management of the exercise’s data, and the ability to create new exercises, etc. In our experience we found Ruby on Rails to be one of the the most productive ways to build web applications. To manage the connection between the DataMind web application and the R servers we use the Rserve package on the R servers. This package makes an R installation remotely accessible. To end, we ensure security with the help of the nice RAppArmor package of Jeroen Ooms.
As mentioned in the beginning, the platform is still in heavy development and we’re experimenting with new and different options on a daily base. If you have any feedback or suggestions on how we can improve this educational platform, please do not hesitate to share them with us: [email protected].
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.