**Revolutions**, and kindly contributed to R-bloggers)

Some things are easy to convert from a long-running sequential process to a system where each part runs at the same time, thus reducing the required time overall. We often call these "embarrassingly parallel" problems, but given how easy it is to reduce the time it takes to execute them by converting them into a parallel process, "pleasingly parallel" may well be a more appropriate name.

Using the foreach package (available on CRAN) is one simple way of speeding up pleasingly parallel problems using R. A `foreach`

loop is much like a regular `for`

loop in R, and by default will run each iteration in sequence (again, just like a `for`

loop). But by registering a parallel "backend" for foreach, you can run many (or maybe even all) iterations at the same time, using multiple processors on the same machine, or even multiple machines in the cloud.

For many applications, though, you need to provide a different chunk of data to each iteration to process. (For example, you may need to fit a statistical model within each country — each iteration will then only need the subset for one country.) You could just pass the entire data set into each iteration and subset it there, but that's inefficient and may even be impractical when dealing with very large datasets sitting in a remote repository. A better idea would be to leave the data where it is, and run R *within* the data repository, in parallel.

Microsoft R 9.1 introduces a new function, `rxExecBy`

, for exactly this purpose. When your data is sitting in SQL Server or Spark, you can specify a set of keys to partition the data by, and an R function (any R function, built-in or user-defined) to apply to the partitions. The data doesn't actually move: R runs directly on the data platform. You can also run it on local data in various formats

The `rxExecBy`

function is included in Microsoft R Client (available free) and Microsoft R Server. For some examples of using `rxExecBy`

, take a look at the Microsoft R Blog post linked below.

Microsoft R Blog: Running Pleasingly Parallel workloads using rxExecBy on Spark, SQL, Local and Localpar compute contexts

**leave a comment**for the author, please follow the link and comment on their blog:

**Revolutions**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...