Using miniCRAN in Azure ML

October 13, 2015
By

(This article was first published on Revolutions, and kindly contributed to R-bloggers)

by Michele Usuelli
Microsoft Data Scientist

Azure Machine Learning Studio is a drag-and-drop tool to deploy data-driven solutions. It contains pre-built items including data preparation tools and Machine Learning algorithms. In addition, it allows to include R and Python custom scripts.

In order to build powerful R tools, you might want to use some packages from the CRAN repository. Azure ML already contains just a few packages, so you might need to include some others. There are 7000+ packages out of which you will need just a few. For this purpose, you can use the miniCRAN package which creates a local repository containing a selection of packages and their dependencies.

You can get a free Azure ML subscription following this:

https://azure.microsoft.com/en-us/trial/get-started-machine-learning

After having subscribed to Azure ML, the first step is creating a miniCRAN local repository. You can find some instructions in this link

http://blog.revolutionanalytics.com/2014/10/introducing-minicran.html

Azure ML is based on Windows, so in the function makeRepo you need to include the argument type = "win.binary". In this demo, you will use the ggplot2 package, so it should be in the list.

After you create your own repository (called repoCRANwin, for instance), the package binary files are stored into the folder repoCRANwinbinwindowscontrib3.1.

Now, you need to zip the main folder repoCRANwin and upload it to Azure ML. For this purpose, from the Azure ML menu, you need to select:

New -> Dataset -> From local file

After having clicked on New (on the bottom-left), you should see this

Clip_image001

Now you need to create a new Azure ML experiment, open the Saved Datasets -> My Dataset tab, and drag and drop repoCRANwin.zip into the experiment.

Clip_image002

Then, you include a custom R script from R Language Modules -> Execute R script.

In order to connect repoCRANwin.zip to the R script, you need to drag its output to the right-hand side input of Execute R script.

Clip_image004

Opening the Execute R script, you can edit its R code. Your targets are

  • Setting up the miniCRAN repository
  • Extracting the list of available packages
  • Testing a package, e.g. ggplot2

This is the R script to include:

# setting-up the repository
uri_repo <- "file:///C:/src/repoCRANwin/"
options(repos = uri_repo)
# extracting the list of available packages
table_packages <- data.frame(package = rownames(available.packages()))
# installing the ggplot2 package
install.packages("ggplot2")
library("ggplot2")
# building a sample ggplot2 chart
p <- qplot(iris$Species) print(p)
# outputting the list of packages
maml.mapOutputPort("table_packages")

Execute R script has two outputs:

  • the list of packages (on the bottom left-hand side)
  • a sample ggplot2 chart (on the bottom right-hand side)

If you click on the left-hand side output and select "Visualize", you'll see this:

  Clip_image005

The "package" column contains the packages that can be installed and loaded.

If you click on the right-hand side output, you'll see a sample ggplot2 chart. If this works, ggplot2 has been loaded and used properly, so we expect that most of the other packages will work.

Loading miniCRAN into an Azure ML R script allows you to access any package that you included. If you have a list of packages that you will use, you can just create a local miniCRAN archive and upload it. Then, you'll just need to input miniCRAN to the related R scripts and include a few lines of R code to configure it into each script. A next step could be defining a miniCRAN repository for each topic. For instance, there might be one for data preparation, one for Machine Learning, and another for data visualization.

To leave a comment for the author, please follow the link and comment on their blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)