Blog Archives

Partools 1.1.4

August 21, 2015
By
Partools 1.1.4

Partools 1.1.4 is now on GitHub. The main change this time is enhancement of the debugging facilities (which work not only for partools but also the cluster-based portion of R’s parallel package in general). As some of you know, I place huge importance on debugging, so much so that I wrote a book on it … Continue reading...

Read more »

partools: a Sensible R Package for Large Data Sets

August 5, 2015
By
partools: a Sensible R Package for Large Data Sets

As I mentioned recently, the new, greatly extended version of my partools package is now on CRAN. (The current version on CRAN is 1.1.3, whereas at the time of my previous announcement it was only 1.1.1. Note that Unix is NOT required.) It is my contention that for most R users who work with large … Continue reading...

Read more »

CACM Highlights R

July 23, 2015
By
CACM Highlights R

The Association for Computing Machinery is the main professional organization for computer science, largely for academia but still with a broad membership. ACM publishes a number of journals, most of them for research but its flagship publication is a magazine, the Communications of the ACM. The current issue of the CACM includes an article, “Bringing … Continue reading...

Read more »

Heteroscedasticity in Regression — It Matters!

June 7, 2015
By
Heteroscedasticity in Regression — It Matters!

R’s main linear and nonlinear regression functions, lm() and nls(), report standard errors for parameter estimates under the assumption of homoscedasticity, a fancy word for a situation that rarely occurs in practice. The assumption is that the (conditional) variance of the response variable is the same at any set of values of the predictor variables. … Continue reading...

Read more »

Macros in R

June 5, 2015
By
Macros in R

In programming, sometimes it’s useful to write a macro rather than a function. (Don’t worry if you’ve never heard the term before.) In this post, I’ll give an example of use of macros in R. using the gtools package on CRAN. I wanted to write some utility code to help me reuse my earlier R commands during … Continue reading...

Read more »

Discovered Two Great Web Sites Today

June 3, 2015
By
Discovered Two Great Web Sites Today

Today is my lucky day.  I learned of two very interesting Web pages, both of them quite informative and the first of them rather provocative (yay!). I have some comments on both, in some cases consisting of mild disagreement, which I may post later, but in any event, I highly recommend both.  Here they are: … Continue reading...

Read more »

Update on Snowdoop, a MapReduce Alternative

May 29, 2015
By
Update on Snowdoop, a MapReduce Alternative

In blog posts a few months ago, I proposed an alternative to MapReduce, e.g. to Hadoop, which I called “Snowdoop.” I pointed out that systems like Hadoop and Spark are very difficult to install and configure, are either too primitive (Hadoop)  or too abstract (Spark) to program, and above all, are SLOW. Spark is of … Continue reading...

Read more »

My New Book and Other Matters

May 22, 2015
By
My New Book and Other Matters

I haven’t posted for a while, so here are some news items: My new book, Parallel Computation for Data Science, will be out in June or July. I believe it will be useful to anyone doing computationally intensive work. After a few months being busy with the book and other things, I have returned to … Continue reading...

Read more »

Tutorial on High-Performance Computing in R

February 3, 2015
By
Tutorial on High-Performance Computing in R

I wanted to call your attention to what promises to be an outstanding tutorial on High-Performance Computing (HPC) in R, presented in Web streaming format. My Rth package coauthor Drew Schmidt, who is also one of the authors of the pbdR package, will be one of the presenters.  Should very interesting and useful.

Read more »

GPU Tutorial, with R Interfacing

January 24, 2015
By
GPU Tutorial, with R Interfacing

You’ve heard that graphics processing units — GPUs — can bring big increases in computational speed.  While GPUs cannot speed up work in every application, the fact is that in many cases it can indeed provide very rapid computation.  In this tutorial, we’ll see how this is done, both in passive ways (you write only … Continue reading...

Read more »