In the world of data preparation a common task is to identify duplicate records in a file or data set. A few years ago, I did most development work in Java, and shudder to think of the amount of code required to accomplish this sort of task. &nbs...

This video brings flashbacks for my days as a statistics consultant to a medical school. Some days we just weren't speaking the same language... (With thanks to reader MK for the tip: "It's funny because it is true...".)

From guest blogger Joseph Rickert. Last night I went to hear Ken Krugler of Bixolabs talk about Hadoop at the monthly meeting of the Software Developers Forum. Maybe because Ken is an unusually lucid speaker, or maybe because I just reached some sort of cumulative tipping point through the prep work of all those patient people who have tried...

Today, Neil posted an article titled" Connecting to a MongoDB database from R using Java". In the current post, I'll show how to use the C API for MongoDB to fetch some MongoDB data from R. The code will be somehow similar to my previous post "A stateful C function for R: parsing Fasta sequences".OK, First, let's...

Brian announced it on r-help and r-sig-finance and I have since updated the R/Finance website and Call for Papers page. And as David Smith already outblogged me about it, without further ado our Call for Paper for next spring's R/Finance conference: ...

The aqp package can be downloaded from R-Forge. read more

Introduction Because R is, in part, a functional programming language, the ‘base’ package contains several higher order functions. By higher order functions, I mean functions that take another function as an argument and then do something with that function. If you want to know more about the usefulness of writing higher order functions in general,

In the previous days I have received several emails asking for clarification of the effective sample size derivation in “Introducing Monte Carlo Methods with R” (Section 4.4, pp. 98-100). Formula (4.3) gives the Monte Carlo estimate of the variance of a self-normalised importance sampling estimator (note the change from the original version in Introducing Monte

Here’s a nice snippet from a 2009 article by Kass that I read yesterday: According to my understanding, laid out above, statistical pragmatism has two main features: it is eclectic and it emphasizes the assumptions that connect statistical models with observed data. The pragmatic view acknowledges that both sides of the frequentist-Bayesian debate made important

