Blog Archives

Tutorial on High-Performance Computing in R

February 3, 2015
By
Tutorial on High-Performance Computing in R

I wanted to call your attention to what promises to be an outstanding tutorial on High-Performance Computing (HPC) in R, presented in Web streaming format. My Rth package coauthor Drew Schmidt, who is also one of the authors of the pbdR package, will be one of the presenters.  Should very interesting and useful.

Read more »

GPU Tutorial, with R Interfacing

January 24, 2015
By
GPU Tutorial, with R Interfacing

You’ve heard that graphics processing units — GPUs — can bring big increases in computational speed.  While GPUs cannot speed up work in every application, the fact is that in many cases it can indeed provide very rapid computation.  In this tutorial, we’ll see how this is done, both in passive ways (you write only … Continue reading...

Read more »

OpenMP Tutorial, with R Interface

January 17, 2015
By
OpenMP Tutorial, with R Interface

Almost any PC today is multicore.  Dual-core is standard, quad-core is easily attainable for the home, and larger systems, say 16-core, are easily within reach of even smaller research projects. In addition, large multicore systems can be “rented” on Amazon EC2 and so on. The most popular way to program on multicore machines is to … Continue reading...

Read more »

Debugging Parallel Code with dbs()

January 4, 2015
By
Debugging Parallel Code with dbs()

I mentioned yesterday that my partools package is now on CRAN.  A number of people have expressed interest in the Snowdoop section, but in this post I want to call attention to the dbs() debugging tool in the package, useful for debugging code written for the portion of R’s parallel library that came from the … Continue reading...

Read more »

Snowdoop/partools Package Now on CRAN

January 3, 2015
By
Snowdoop/partools Package Now on CRAN

I’ve now placed the partools package, including Snowdoop, on CRAN.  No major new functions since my last posting, but the existing functions have been made more versatile and convenient, and the documentation is now more detailed, with more examples and so on.  I do have more functions planned. It is all platform independent, except for … Continue reading...

Read more »

Snowdoop/partools Update

December 27, 2014
By
Snowdoop/partools Update

I’ve put together an updated version of my partools package, including Snowdoop, an alternative to MapReduce algorithms.  You can download it here, version 1.0.1. To review:  The idea of Snowdoop is to create your own file chunking, rather than having something like Hadoop do it for you, and then using ordinary R coding to perform … Continue reading...

Read more »

More Snowdoop Coming

December 16, 2014
By
More Snowdoop Coming

In spite of the banter between Yihui and me, I’m glad to hear that he may be interested in Snowdoop, as are some others.  I’m quite busy this week (finishing writing my Parallel Computation for Data Science book, and still have a lot of Fall Quarter grading to do :-) ), but you’ll definitely be … Continue reading...

Read more »

New Package: partools

December 15, 2014
By
New Package:  partools

I mentioned last week that I would be putting together a package, based in part on my posts on Snowdoop.  I’ve now done so, in a package partools., with the name alluding to the fact that they are intended for use with the cluster-based part of R’s parallel package.  The main ingredients are: Various code … Continue reading...

Read more »

Snowdoop, Part II

December 7, 2014
By
Snowdoop, Part II

In my last post, I questioned whether the fancy Big Data processing tools such as Hadoop and Spark are really necessary for us R users.  My argument was that (a) these tools tend to be difficult to install and configure, especially for non-geeks; (b) the tools require learning new computation paradigms and function calls; and … Continue reading...

Read more »

How About a “Snowdoop” Package?

November 26, 2014
By

Along with all the hoopla on Big Data in recent years came a lot of hype on Hadoop.  This eventually spread to the R world, with sophisticated packages being developed such as rmr to run on top of Hadoop. Hadoop made it convenient to process data in very large distributed databases, and also convenient to create … Continue reading...

Read more »