correlograms are correlicious

April 6, 2010
By
correlograms are correlicious

In the last year or so, I’ve been experimenting with different ways of displaying correlation matrices, and have gotten very fond of color-coded correlograms. Here’s one from a paper I wrote investigating the relationship between personality and word use among bloggers (click to enlarge): The rows reflect language categories from Jamie Pennebaker’s Linguistic Inquiry and

Read more »

New version of R package futile released

April 6, 2010
By
New version of R package futile released

The latest version of futile was released to CRAN yesterday. This release broke out the various functions into self-contained sub-packages …Continue reading »

Read more »

Cherry Picking to Generalize ~ NASA Global Temperature Trends

April 6, 2010
By
Cherry Picking to Generalize ~ NASA Global Temperature Trends

The relatively (to this decade) cool 2008 global temperatures spurred talks of a warming pause, or even global cooling. The claim usually comes from people who cherry picked either data sets and(!)/or start and end points of the global temperature trends to back up their allegation. The blogosphere already has a lot on this: Skeptical

Read more »

Le Monde rank test (corr’d)

April 6, 2010
By
Le Monde rank test (corr’d)

Since my first representation of the rank statistic as paired was incorrect, here is the histogram produced by the simulation perm=sample(1:20) saple=sum(abs(sort(perm)-sort(perm))) when . It is obviously much closer to zero than previously. An interesting change is that the regression of the log-mean on produces > lm(log(memean)~log(enn)) Call: lm(formula = log(memean) ~ log(enn)) Coefficients: (Intercept)    

Read more »

R package Blotter

April 6, 2010
By
R package Blotter

How many times have you been disappointed by nice trading system, because neither trading cost or slippage or bid/ask spread were included into back-test results? Did you find difficult to back-test a portfolio in R or many portfolios with different stocks? Blotter package is supposed to solve these problems. In really – it is complicated. I

Read more »

ProbABEL – R package for GWAS data imputation

April 6, 2010
By

I've been using GenABEL for some time now for GWAS analysis using related individuals. It has an excellent set of functions for estimating a kinship matrix from a dense marker panel and then using this in a linear mixed effects model to allow for relat...

Read more »

New R User Group in Chicago

April 6, 2010
By

While there's been an informal coterie of R users in the Chicago area for some time (notably the fine folks behind the successful R/Finance conferences) there hasn't been a formal R User Group. Until now, that is. JD Long has taken the plunge and announced the new Chicago R User Group on meetup.com. If you're in the Chicagoland area,...

Read more »

Rules of Thumb to Meet R Gurus in the Help List

April 5, 2010
By

Here is my personal list of rules of thumb for people who want to meet some R gurus (quickly) in the R help mailing list ([email protected]): If you want to meet Dr Bill Venables, just say something about Type III Sum of Squares (better if you also mention the “unbeatable” SAS); If you want to

Read more »

R Tools for Dynamical Systems ~ R pplane to draw phase planes

April 5, 2010
By
R Tools for Dynamical Systems ~ R pplane to draw phase planes

MATLAB has a nice program called pplane that draws phase planes of differential equations models. pplane on MATLAB is an elaborate program with an interactive GUI where you can just type the model to draw the phase planes. The rest you fidget by clicking (to grab the initial conditions) and it draws the dynamics automatically.

Read more »

R on the iPhone

April 5, 2010
By

R Twitterer ech0chrome reports that (s)he has successfully installed R on an iPhone. I haven't tried it myself, since it requires jailbreaking the iPhone, but full instructions have been posted to the R Wiki. It seems that once you have Debian installed on the iPhone, it's simply a matter of installing the required Debian packages. Seems like a neat...

Read more »

Example 7.31: Contour plot of BMI by weight and height

April 5, 2010
By
Example 7.31: Contour plot of BMI by weight and height

A contour plot is a simple way to plot a surface in two dimensions. Lines with a constant Z value are plotted on the X-Y plane.Typical uses include weather maps displaying "isobars" (lines of constant pressure), and maps displaying lines of constant e...

Read more »

Le Monde rank test (cont’d)

April 5, 2010
By
Le Monde rank test (cont’d)

Following a comment from efrique pointing out that this statistic is called Spearman footrule, I want to clarify the notation in namely (a) that the ranks of and are considered for the whole sample, i.e. instead of being computed separately for the ‘s and the ‘s, and then (b) that the ranks are reordered for

Read more »

UCLA and LA RUG talks on R and C++ integration

April 4, 2010
By

We spent last week in the LA area and had a generally good time out west. I was able to sneak in two talks and a group discussion, thanks to the help by Jan de Leeuw (and everybody at UCLA's Stats department) as well as by Szilard Pafka representing ...

Read more »

UCLA and LA RUG talks on R and C++ integration

April 4, 2010
By

We spent last week in the LA area and had a generally good time out west. I was able to sneak in two talks and a group discussion, thanks to the help by Jan de Leeuw (and everybody at UCLA's Stats department) as well as by Szilard Pafka representing th...

Read more »

UCLA and LA RUG talks on R and C++ integration

April 4, 2010
By

We spent last week in the LA area and had a generally good time out west. I was able to sneak in two talks and a group discussion, thanks to the help by Jan de Leeuw (and everybody at UCLA's Stats department) as well as by Szilard Pafka representing ...

Read more »

Le Monde rank test

April 4, 2010
By
Le Monde rank test

In the puzzle found in Le Monde of this weekend, the mathematical object behind the silly story is defined as a pseudo-Spearman rank correlation test statistic, where the difference between the ranks of the paired random variables and is in absolute value instead of being squared as in the Spearman rank test statistic. I don’t

Read more »

Why isn’t my 2X Ultra ETF keeping pace with the market and what is path asymmetry (R ex)? Part 2

April 3, 2010
By
Why isn’t my 2X Ultra ETF keeping pace with the market and what is path asymmetry (R ex)? Part 2

I created an example to show how the theory from part 1 might be applied using S&P500 as a proxy for performance. Just in case anyone viewing is not familiar with terminal wealth, it is the final (usually compounded) ending value (hence, terminal) of ...

Read more »

R-Node: a web front-end to R with Protovis

April 3, 2010
By
R-Node: a web front-end to R with Protovis

Update (April 6 – 2010) : R-Node now has it’s own a website, with a dedicated google group (you can join it here) * * * * The integration of R into online web services is (for me) one of the more exciting prospects in R’s future. That is way I was very excited coming across Jamie...

Read more »

embed images in Rd documents

April 3, 2010
By

The new help system that was introduced in R 2.10.0 and documented in an article of the R journal is very promising. One thing that is planned for future versions of R (maybe 2.12.0) is some way to include images into Rd documents using the fig option of the Sexpr macro Another way is to use data...

Read more »

Demonstrating the Power of F Test with gWidgets

April 2, 2010
By

e know the real distribution of the F statistic in linear models — it is a non-central F distribution. Under H0, we have a central F distribution. Given 1 – α, we can compute the probability of (correctly) rejecting H0. I created a simple demo to illustrate how the power changes as other parameters vary,

Read more »

Because it’s Friday: Chatroulette

April 2, 2010
By

Yesterday, Drew Conway posted an analysis of the survival time to events on Chatroulette. If you're familiar with Chatroulette, you'll know what kind of events you can expect to occur when using it. (If you're not, here's a hint: don't try it now if you're at work.) Sadly, it was all an April Fool's Day joke. But Drew takes...

Read more »

A free book on Geostatistical Mapping with R

April 2, 2010
By
A free book on Geostatistical Mapping with R

Tomislav Hengl of the University of Amsterdam has published new book, A Practical Guide to Geostatistical Mapping. It's jam-packed with 291 pages on mapping and analyzing spatial data using free software including R, SAGA, GRASS, ILWIS and Google Earth, and freely-available map data. The book itself is also available for free, as an Open Access Publication. You can order...

Read more »

How to Produce Fake Data Analysis in R: 3 Easy Steps

April 2, 2010
By
How to Produce Fake Data Analysis in R: 3 Easy Steps

Did you really think that a team of researchers spent their weekends counting the number of shirtless adolescent men and exposed penises they could find on charoulette.com? Perhaps you should not answer that, as it may be a better measure of your opinion of sociologist than gullibility. It is true, sociologist do say the

Read more »

CLT Standard Normal Generator

April 2, 2010
By
CLT Standard Normal Generator

I’ve found this standard normal random number generator in a number of places, one of which being from one of Paul Wilmott’s books. The idea is that we can use the Central Limit Theorem (CLT) to easily generate values distributed according to a standard normal distribution by using the sum of 12 uniform random

Read more »

Lookup Performance in R

April 2, 2010
By

Rumor has it that Joe Adler, author of the O’Reilly Book R in a Nutshell, has joined Linked In as a data scientist.  But that does not keep him from still pumping out some interesting content over at OReilly.com. His latest article is about lookup performance in R. He does a great job giving code

Read more »

Opening Statements on Markov Chain Monte Carlo

April 1, 2010
By
Opening Statements on Markov Chain Monte Carlo

This quarter I am TAing UCLA’s Statistics 102C. Introduction to Monte Carlo Methods for Professor Qing Zhou. This course did not exist when I was an undergraduate, and I think it is pretty rare to teach Monte Carlo (minus the bootstrap if you count that) or MCMC to undergrads. I am excited about this class because to me, MCMC...

Read more »

Frank Harrell’s Regression Modeling Strategies Course Handouts

April 1, 2010
By

The previously mentioned Regression Modeling Strategies short course taught by Frank Harrell is nearly over. Here are the handouts (PDF) from the course. Keep an eye out here, I'll be writing a few more posts in the near future on topics Frank covered...

Read more »

Quantile LOESS – Combining a moving quantile window with LOESS (R function)

April 1, 2010
By
Quantile LOESS – Combining a moving quantile window with LOESS (R function)

In this post I will provide R code that implement’s the combination of repeated running quantile with the LOESS smoother to create a type of “quantile LOESS” (e.g: “Local Quantile Regression”). This method is useful when the need arise to fit robust and resistant (Need to be verified) a smoothed line for a quantile (an example for such a...

Read more »

Because it’s Thursday: Epidemiology of the Undead

April 1, 2010
By
Because it’s Thursday: Epidemiology of the Undead

Noted statistician Andrew Gelman has teamed up with occultist George Romero to address the most serious public-health threat of out time: Zombies. They've published a paper in the journal Biomastika, "How many zombies do you know?" to propose the use of indirect survey methods to measure outbreaks of the undead: Abstract: The zombie menace has so far been studied...

Read more »