R on the iPhone

April 5, 2010
By

R Twitterer ech0chrome reports that (s)he has successfully installed R on an iPhone. I haven't tried it myself, since it requires jailbreaking the iPhone, but full instructions have been posted to the R Wiki. It seems that once you have Debian installed on the iPhone, it's simply a matter of installing the required Debian packages. Seems like a neat...

Example 7.31: Contour plot of BMI by weight and height

April 5, 2010
By

A contour plot is a simple way to plot a surface in two dimensions. Lines with a constant Z value are plotted on the X-Y plane.Typical uses include weather maps displaying "isobars" (lines of constant pressure), and maps displaying lines of constant e...

Le Monde rank test (cont’d)

April 5, 2010
By
$Le Monde rank test (cont’d)$

Following a comment from efrique pointing out that this statistic is called Spearman footrule, I want to clarify the notation in namely (a) that the ranks of and are considered for the whole sample, i.e. instead of being computed separately for the ‘s and the ‘s, and then (b) that the ranks are reordered for

UCLA and LA RUG talks on R and C++ integration

April 4, 2010
By

We spent last week in the LA area and had a generally good time out west. I was able to sneak in two talks and a group discussion, thanks to the help by Jan de Leeuw (and everybody at UCLA's Stats department) as well as by Szilard Pafka representing ...

UCLA and LA RUG talks on R and C++ integration

April 4, 2010
By

We spent last week in the LA area and had a generally good time out west. I was able to sneak in two talks and a group discussion, thanks to the help by Jan de Leeuw (and everybody at UCLA's Stats department) as well as by Szilard Pafka representing th...

UCLA and LA RUG talks on R and C++ integration

April 4, 2010
By

We spent last week in the LA area and had a generally good time out west. I was able to sneak in two talks and a group discussion, thanks to the help by Jan de Leeuw (and everybody at UCLA's Stats department) as well as by Szilard Pafka representing ...

Le Monde rank test

April 4, 2010
By
$Le Monde rank test$

In the puzzle found in Le Monde of this weekend, the mathematical object behind the silly story is defined as a pseudo-Spearman rank correlation test statistic, where the difference between the ranks of the paired random variables and is in absolute value instead of being squared as in the Spearman rank test statistic. I don’t

Why isn’t my 2X Ultra ETF keeping pace with the market and what is path asymmetry (R ex)? Part 2

April 3, 2010
By

I created an example to show how the theory from part 1 might be applied using S&P500 as a proxy for performance. Just in case anyone viewing is not familiar with terminal wealth, it is the final (usually compounded) ending value (hence, terminal) of ...

R-Node: a web front-end to R with Protovis

April 3, 2010
By

Update (April 6 – 2010) : R-Node now has it’s own a website, with a dedicated google group (you can join it here) * * * * The integration of R into online web services is (for me) one of the more exciting prospects in R’s future. That is way I was very excited coming across Jamie...

embed images in Rd documents

April 3, 2010
By

The new help system that was introduced in R 2.10.0 and documented in an article of the R journal is very promising. One thing that is planned for future versions of R (maybe 2.12.0) is some way to include images into Rd documents using the fig option of the Sexpr macro Another way is to use data...

Demonstrating the Power of F Test with gWidgets

April 2, 2010
By

e know the real distribution of the F statistic in linear models — it is a non-central F distribution. Under H0, we have a central F distribution. Given 1 – α, we can compute the probability of (correctly) rejecting H0. I created a simple demo to illustrate how the power changes as other parameters vary,

Because it’s Friday: Chatroulette

April 2, 2010
By

Yesterday, Drew Conway posted an analysis of the survival time to events on Chatroulette. If you're familiar with Chatroulette, you'll know what kind of events you can expect to occur when using it. (If you're not, here's a hint: don't try it now if you're at work.) Sadly, it was all an April Fool's Day joke. But Drew takes...

A free book on Geostatistical Mapping with R

April 2, 2010
By

Tomislav Hengl of the University of Amsterdam has published new book, A Practical Guide to Geostatistical Mapping. It's jam-packed with 291 pages on mapping and analyzing spatial data using free software including R, SAGA, GRASS, ILWIS and Google Earth, and freely-available map data. The book itself is also available for free, as an Open Access Publication. You can order...

How to Produce Fake Data Analysis in R: 3 Easy Steps

April 2, 2010
By

Did you really think that a team of researchers spent their weekends counting the number of shirtless adolescent men and exposed penises they could find on charoulette.com? Perhaps you should not answer that, as it may be a better measure of your opinion of sociologist than gullibility. It is true, sociologist do say the

CLT Standard Normal Generator

April 2, 2010
By
$CLT Standard Normal Generator$

I’ve found this standard normal random number generator in a number of places, one of which being from one of Paul Wilmott’s books. The idea is that we can use the Central Limit Theorem (CLT) to easily generate values distributed according to a standard normal distribution by using the sum of 12 uniform random

Lookup Performance in R

April 2, 2010
By

Rumor has it that Joe Adler, author of the O’Reilly Book R in a Nutshell, has joined Linked In as a data scientist.  But that does not keep him from still pumping out some interesting content over at OReilly.com. His latest article is about lookup performance in R. He does a great job giving code

Opening Statements on Markov Chain Monte Carlo

April 1, 2010
By
$Opening Statements on Markov Chain Monte Carlo$

This quarter I am TAing UCLA’s Statistics 102C. Introduction to Monte Carlo Methods for Professor Qing Zhou. This course did not exist when I was an undergraduate, and I think it is pretty rare to teach Monte Carlo (minus the bootstrap if you count that) or MCMC to undergrads. I am excited about this class because to me, MCMC...

Frank Harrell’s Regression Modeling Strategies Course Handouts

April 1, 2010
By

The previously mentioned Regression Modeling Strategies short course taught by Frank Harrell is nearly over. Here are the handouts (PDF) from the course. Keep an eye out here, I'll be writing a few more posts in the near future on topics Frank covered...

Quantile LOESS – Combining a moving quantile window with LOESS (R function)

April 1, 2010
By

In this post I will provide R code that implement’s the combination of repeated running quantile with the LOESS smoother to create a type of “quantile LOESS” (e.g: “Local Quantile Regression”). This method is useful when the need arise to fit robust and resistant (Need to be verified) a smoothed line for a quantile (an example for such a...

Because it’s Thursday: Epidemiology of the Undead

April 1, 2010
By

Noted statistician Andrew Gelman has teamed up with occultist George Romero to address the most serious public-health threat of out time: Zombies. They've published a paper in the journal Biomastika, "How many zombies do you know?" to propose the use of indirect survey methods to measure outbreaks of the undead: Abstract: The zombie menace has so far been studied...

Plots in R and the ImageJ visualization

April 1, 2010
By

If you plot data in R and you would like to display the same data in the ImageJ view it is necessary to transfer the data matrix to ImageJ. The first thing which can be noticed is that the image data is displayed rotated because of the Bio7 approach to transfer data forth and back

abbreviating personality measures in R: a tutorial

March 31, 2010
By

A while back I blogged about a paper I wrote that uses genetic algorithms to abbreviate personality measures with minimal human intervention. In the paper, I promised to put the R code I used online, so that other people could download and use it. I put off doing that for a long time, because the

Social Media Analytics Research Toolkit (SMART@znmeb) Is Moving Into Private Beta

March 31, 2010
By

Download "Getting Started with the Social Media Analytics Research Toolkit" (pdf, 1.25 megabytes) Download the Social Media Analytics Research Toolkit My Social Media Analytics Research Toolkit is about to move into private beta. What's in the release?...

March 31, 2010
By

Adam Bonica, a grad student in political science at NYU, recently published a ranking of the political slant of various professions, based on the amount and recipient (Republican or Democratic) of political donations by lawyers, lobbyists, physicians and many other occupations. This paper (PDF) gives the complete analysis, but the chart below (created using the ggplot2 graphics package in...

Why isn’t my 2X Ultra ETF keeping pace with the market and what is path asymmetry (R ex)?

March 31, 2010
By

I've been reading a few articles lately, lambasting ultra ETFs for not keeping up with markets and ascribing the problem to weird unexplainable reasons such as portfolio derivative re-balancing and negative drift. I thought it would be nice to revisit...

Predicting April month return

March 31, 2010
By

Bespoke blogged about average monthly returns of the DJI and emphasized April. Before jumping on that information, let’s check some weak points. In that post, only average returns are presented. We need at least extreme points (min;max) and confidence ranges. Second problem – the normal market have upward trend and we need to get rid of

Lotka-Volterra model ~ intro

March 30, 2010
By
$Lotka-Volterra model ~ intro$

So many know about the Lotka-Volterra model (i.e. the predator-prey model) in ecology. This model portrays two species, the predator (y) and the prey (x), interacting each other in limited space. The prey grows at a linear rate () and gets eaten by the predator at the rate of (). The predator gains a certain

Some Code for Dumping Data from Twitter Gardenhose

March 30, 2010
By

Gardenhose is a Streaming API feed that continuously sends a sample (roughly 15% according to Ryan Sarver at the 140tc in September 2009) of all tweets to feed recipients. This is some code for dumping the tweets to files named by date and hour. It is in PHP which is not my favorite language, but works nonetheless. I received...