A scalable data science platform with Microsoft R Server and Spark
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
If you want to train a statistical model on very large amounts of data, you'll need three things: a storage platform capable of holding all of the training data, a computational platform capable of efficently performing the heavy-duty mathematical computations required, and a statistical computing language with algorithms that can take advantage of the storage and computation power. Microsoft R Server, running on HDInsight with Apache Spark provides all three.
As Mario Inchiosa and Roni Burd demonstrate in this recorded webinar, Microsoft R Server can now run within HDInsight Hadoop nodes running on Microsoft Azure. Better yet, the big-data-capable algorithms of ScaleR (pdf) take advantage of the in-memory architecture of Spark, dramatically reducing the time needed to train models on large data. And if your data grows or you just need more power, you can dynamically add nodes to the HDInsight cluster using the Azure portal.
Many of the details are in the slides embdedded above, but to see a demonstration of Microsoft R Server running on Spark with HDInsight, click on the link below for access to the recorded webinar.
Microsoft Azure On-Demand Webinar: Building A Scalable Data Science Platform with R and Hadoop
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.