R 2.12.2 is available

February 25, 2011
By
R 2.12.2 is available

As previously announced, R 2.12.2 is available for download today. Browsing through the various mirrors (using the Download R tool on inside-R.org), it looks like the Windows version is already available on many mirrors; the Mac and Linux versions will follow soon (and of course, sources are available now). The complete list of changes is in the announcement on...

Read more »

Tutorial on Distributions in R

February 25, 2011
By
Tutorial on Distributions in R

Here's a video tutorial I put together to go over how to generate a random sample from one of the commonly known parametric distributions in R.Along the way, I also discuss how some of the properties of estimators are reflected in the computations I pe...

Read more »

Mapping the 2011 Chicago Mayoral Democratic Primary

February 25, 2011
By
Mapping the 2011 Chicago Mayoral Democratic Primary

Mapping the Chicago Democratic Mayoral 2011 primary with Ruby, R, and ggplot2

Read more »

Setting up a parallel computing cluster for R with OpenSSH and doSNOW

February 25, 2011
By

Responding to yesterday's post which included an aside on using parallel processing for by-group computations in R, reader Christian Gunning mused about the possibility of using doSNOW on his network, with OpenSSH to manage the authentication: I sit on a fast campus network and have at least 10 remote cores available that I could farm out for big jobs....

Read more »

Example 8.27: using regular expressions to read data with variable number of words in a field

February 25, 2011
By
Example 8.27: using regular expressions to read data with variable number of words in a field

A more or less anonymous reader commented on our last post, where we were reading data from a file with a varying number of fields. The format of the file was:1 Las Vegas, NV --- 53.3 --- --- 12 Sacramento, CA --- 42.3 --- --- 2The complication in the...

Read more »

snow and ssh — secure inter-machine parallelism with R

February 24, 2011
By

I just threw a post up on Revolutions, which got a lot longer than I planned. And got me thinking. And reading (see refs in previous post). And trying. Turns out that it was way easier than I thought! The problem:From the blog post: " OpenSSH is now available on all platforms. A sensible solution...

Read more »

MT4 -> Multi-R sessions for tick-analysis

February 24, 2011
By
MT4 -> Multi-R sessions for tick-analysis

The Shared-Memory between multiple R sessions mentioned in my previous post got me thinking … quite some potential indeed. As a result, I investigated further using (calling) multiple R sessions from the same MT4 script. Specifically, I wanted to have a clearer understanding of the time required to performed lightning fast & dead slow processing,

Read more »

How to read and write Stata data (.dta) files into R

February 24, 2011
By
How to read and write Stata data (.dta) files into R

Here's an R tutorial where I explain how to read Stata data files into R (even if you don't own the program Stata). I also offer some other basic tips.Of note, you can also write Stata .dta files from R (if your coauthors or journals insist on having ...

Read more »

when Nuns or Hells Angels get in a plane

February 24, 2011
By
when Nuns or Hells Angels get in a plane

Today, at lunch, Matthieu told us a nice story (or call it a paradox if you like) about the probability to find you seat empty when you get in a place.  a plane full of nuns Assume that you are in the line to get in the airplane, you are the ...

Read more »

Packages for By-Group Processing in R

February 24, 2011
By

Analyst and BI expert Steve Miller takes a look at the facilities in R for doing "by-group" processing of data. The task consisted of: ... read several text files, merge the results, reshape the intermediate data, calculate some new variables, take care of missing values, attend to meta data, execute a few predictive models and graph the results. Then...

Read more »

Split a Data Frame into Testing and Training Sets in R

February 24, 2011
By

I recently analyzed some data trying to find a model that would explain body fat distribution as predicted by several blood biomarkers. I had more predictors than samples (p>n), and I didn't have a clue which variables, interactions, or quadratic terms made biological sense to put into a model. I then turned to a few data mining procedures that I...

Read more »

Split a Data Frame into Testing and Training Sets in R

February 24, 2011
By

I recently analyzed some data trying to find a model that would explain body fat distribution as predicted by several blood biomarkers. I had more predictors than samples (p>n), and I didn't have a clue which variables, interactions, or quadratic terms made biological sense to put into a model. I then turned to a few data mining procedures that I...

Read more »

Phenotypic selection analysis in R

February 24, 2011
By
Phenotypic selection analysis in R

I have up to recently always done my phenotypic selection analyses in SAS. I finally got some code I think works to do everything SAS would do. Feedback much appreciated!########################Selection analyses#############################install.pac...

Read more »

Rcpp 0.9.2

February 24, 2011
By

The 0.9.2 release of Rcpp is now on CRAN and Debian. This version contains a build fix for the older 10.5.* version of OS X and its g++ 4.2.1 compiler; we now skip one test that upset it. CRAN builds for OS X should resume. We also added simple ...

Read more »

Book review: 25 Recipes for Getting Started with R

February 24, 2011
By
Book review: 25 Recipes for Getting Started with R

Recently I was asked by O’Reilly publishing to give a book review for Paul Teetor new introductory book to R.  After giving the book some attention and appreciating it’s delivery of the material, I was happy to write and post this review.  Also, I’m very happy to see how a major publishing house like O’Reilly is producing more and

Read more »

type=”n” graphs in R

February 24, 2011
By
type=”n” graphs in R

type n graph.R Download this file One of the most useful graphs you can produce in R using the plot(...) function is one with nothing in it. Using the type="n" option, you get a blank canvas to which you can add points, lines, text, sh...

Read more »

Machine Learning Ex2 – linear regression

February 24, 2011
By
Machine Learning Ex2 – linear regression

Andrew Ng has posted introductory machine learning lessons on the OpenClassRoom site. I've watched the first set and will here solve Exercise 2. The exercise is to build a linear regression implementation, I'll use R. The point of linear regression is to come up with a mathematical function(model) that represents the data as best as possible, that is done...

Read more »

What’s the best platform for a high score on Canabalt?

February 23, 2011
By
What’s the best platform for a high score on Canabalt?

The Web-based Flash game Canabalt, whose scores have been analyzed by R before, is now available as an iOS App. Because the app is configured to work on three different platforms: the iPad, iPhone and iPod Touch; and because players are invited to tweet their best scores at the end of the game, like this: the Twitter stream again...

Read more »

Discussions on the future of R

February 23, 2011
By
Discussions on the future of R

Inspired by the discussions on the same topic, Avram Aelony presented an overview of the issues and the Los Angeles R users group proceeded with further discussions.

Read more »

HRSA Area Resource File Format 2009

February 23, 2011
By

From the HRSA website: is a database containing more than 6,000 variables for each of the nation’s counties. ARF contains information on health facilities, health professions, measures of resource scarcity, health status, economic activity, health training programs, and socioeconomic and environmental characteristics. The data file itself is formatted accordingly (from the ARF

Read more »

RQuantLib 0.3.6

February 23, 2011
By

A bug-fix release RQuantLib 0.3.6 is now on CRAN and in Debian. RQuantLib combines (some of) the quantitative analytics of QuantLib with the R statistical computing environment and language. There are only two changes to two files where an explic...

Read more »

Experimental S4 Classes and Methods added to AQP (Algorithms for Quantitative Pedology) Package

February 23, 2011
By
Experimental S4 Classes and Methods added to AQP (Algorithms for Quantitative Pedology) Package

Thanks to some help from Pierre Roudier, the aqp package now has some new S4-style classes and methods-- custom tailored to the complexities of soil profile data. These new classes/methods are only available in the aqp development branch, found on our ...

Read more »

R/Finance 2011

February 23, 2011
By
R/Finance 2011

I will be speaking at R/Finance 2011 in Chicago at the end of April regarding the futile.paradigm, my R language …Continue reading »

Read more »

Statistics and Computing and ABC

February 23, 2011
By
Statistics and Computing and ABC

Statistics and Computing has received several papers on ABC and plans to make a special ABC issue out of these. All submissions prior to June 2011 that will be accepted will be published in this special issue. The special issue is identified as an article type on the on-line page. In case of questions or

Read more »

sab-R-metrics: Basic Applied Regression (OLS)

February 23, 2011
By
sab-R-metrics: Basic Applied Regression (OLS)

Today, I'll again be using a new data set that can be found here at my website (called 'leagueoutcomes.csv'). The data set includes the standings results of the 2009 season for MLB along with average game attendance by team. I'll use this to go over some basic regression techniques and tools in R. Hopefully this...

Read more »

sab-R-metrics: Basic Applied Regression (OLS)

February 23, 2011
By
sab-R-metrics: Basic Applied Regression (OLS)

Today, I'll again be using a new data set that can be found here at my website (called 'leagueoutcomes.csv'). The data set includes the standings results of the 2009 season for MLB along with average game attendance by team. I'll use this to go over some basic regression techniques and tools in R. Hopefully this...

Read more »

Course: Machine Learning with R

February 22, 2011
By

Starting on March 5 at the Hacker Dojo in Mountain View (CA), Mike Bowles and Patricia Hoffmann will present a course on Machine Learning where R will be the "lingua franca" for looking at homework problems, discussing them and comparing different solution approaches. The class will begin at the level of elementary probability and statistics and from that background...

Read more »

My R setup with Mac OS X

February 22, 2011
By
My R setup with Mac OS X

The eco-system of R is largely Ubuntu and SVN, so Mac users sometimes find themselves a bit out of place, shall we say. But let's not bad high-school memories about not being in the in-crowd keep us from participating in the R world. With just a little...

Read more »

Stochastic approximation in mixtures

February 22, 2011
By
Stochastic approximation in mixtures

On Friday, a 2008 paper on Stochastic Approximation and Newton’s Estimate of a Mixing Distribution by Ryan Martin and J.K. Ghosh was posted on arXiv. (I do not really see why it took so long to post on arXiv a 2008 Statistical Science paper but given that it is not available on project Euclid, it

Read more »