Both lattice and ggplot2 seem really interesting and worthy of learning. But I only have time to learn one of them, and the choice is not an easy one.Here is an awesome reference; this blog is generally very interesting; and here is something...

Given the growing interest in parallel processing through GPUs or multiple processors, there is a clear need for a proper use of (uniform) random number generators in this environment. We were discussing the issue yesterday with Jean-Michel Marin and briefly looked at a few solutions: given p parallel streams/threads/processors, starting each generator with a random

Last Friday, Romain and I were guests of the R intergrouplet (what an adorable name!) at Google's headquarter in Mountain View. This arose out of discussions following useR! 2010 where we met Google's Murray Stokely. There appears to be ever increasi...

Following up on the successful "R in a Nutshell", O'Reilly has just published a new book on R, The R Cookbook, by Paul Teetor. Here's the description: Perform data analysis with R quickly and efficiently using the task-oriented recipes in this cookbook. The R language and environment include everything necessary to perform statistical work right out of the box,...

In an article looking at once-niche programming languages that are now being deployed in businesses, R is named as one of 7 programming languages on the rise: R is another Swiss Army Knife of numerical and statistical routines for hacking through the big data sets -- collections big enough that it might be better called a Swiss Army Machete....

The Dallas R User Group had a meeting over the weekend. One of the discussions is the memory limitations with R. This is a common subject among the R community and R User Groups. There has been a lot of strides recently in allowing R ...

Last week I participated in bit.ly’s fourth hackabit hack-a-thon, which is a wonderful opportunity for NYC area hackers to get together, eat pizza, drink energy drinks, and stay up late hacking with some of the best data geeks around. I was lucky enough to saddle up next to Hilary Mason, bit.ly’s lead scientist, recently named

After our recent discussion of semigraphic displays, Jay Ulfelder sent along a semigraphic table from his recent book. He notes, "When countries are the units of analysis, it's nice that you can use three-letter codes, so all the proper names....

Handling large dataset in R, especially CSV data, was briefly discussed before at Excellent free CSV splitter and Handling Large CSV Files in R. My file at that time was around 2GB with 30 million number of rows and 8 columns. Recently I started to collect and analyze US corporate bonds tick data from year...

The R project, born in New Zealand in 1993, has been nominated as the best open-source project in the New Zealand Open-Source Awards 2010. R's co-creator Ross Ihaka talks about the project in this article by the New Zealand Herald: Ross Ihaka from the University of Auckland started developing R 20 years ago, but it took off about a...

The online training provider Statistics.com has three great courses based on R coming up in the next few months: Nov. 5 - Dec. 3: "Graphics in R," with Paul Murrell Nov. 20 – Dec. 18: Support Vector Machines in R" with Dr. Lutz Hamel Dec. 17 - Jan. 22: "Geostatistics in R" with Prof. David Unwin The courses take...

I’ve had a few questions on this topic lately. Here is an email received today: I use Eviews to estimate time series, but I have been checking out R recently, and your Forecast package. I cannot understand why 2 similar equations in Eviews and R are giving different estimated output. Your insights will be invaluable

R is the lingua franca of Statistics: R code and R packages is the means by which statisticians communicate ideas and methods for statistical analysis. The reasons why are discussed in this article, but it also begs the question: what's wrong with the spoken or written word? How Statistics and Probability relate to the English language is the subject...

Interactive Brokers via Matlab was mentioned at the old post Matlab trading code, IBrokers: R API to Interactive Brokers Trader Workstation is the R package I realize for algo trading API. Should you are also interested, you can watch the following sh...

Previosly, I calculated a bunch of ad-hoc power curves from GISTEMP data. Power is essentially a reframing of the p-value, to see the significance of the trend lines in the global temps. However, power calculations are inherently very noisy, hence, my ad-hoc way of aggregating the data. Another method is to bootstrap through the responses

This past Tuesday I had the opportunity to present a short talk (a bit long) related to text mining at the Los Angeles R Users’ Group. Since I do most of my text mining in Python, I took this opportunity to discuss RPy2, an interface to R from Python. My slides are below: Accessing R from Python...

In a previous post we considered writing a simple function to calculate the volume of a cylinder by specifying the height and radius of the cylinder. The function did not have any checking of the validity of the function arguments which we will consider in this post. R has various functions that we can use to

In the recent posts Visualizing Smoking Risk and Shades of grey I wrote about the use of “Risk Characterization Theatres” (RCTs) to communicate probabilities. I found the idea in the book The Illusion of Certainty, by Eric Rifkin and Edward Bouwer. Here is how they explain the RCTs: Most of us are familiar with the crowd in a