## In case you missed it: March Roundup

April 13, 2010
In case you missed them, here are some articles from last month of particular interest to R users. We reviewed a special report in The Economist on the "Data Deluge" and the growing importance of statistical analysis in business. One section mentioned R specifically. We announced that Zack Urlocker, formerly responsible for engineering and marketing for the open-source database...

## formatR: farewell to ugly R code

April 13, 2010
It is not uncommon to see messy R code which is almost not human-readable like this: # rotation of the word "Animation" # in a loop; change the angle and color # step by step for (i in 1:360) { # redraw the plot again and again plot(1,ann=FALSE,type="n",axes=FALSE) # rotate; use rainbow() colors text(1,1,"Animation",srt=i,col=rainbow(360),cex=7*i/360) #

## Efficient Mixed-Model Association in GWAS using R

April 13, 2010
I recently did an analysis for the eMERGE network where I had lots of individuals from a small town in central Wisconsin where many of the subjects were related to one another. The subjects could not be treated as independent, but I could not use a fam...

## Repeated measures ANOVA with R (tutorials)

April 13, 2010
Repeated measures ANOVA is a common task for the data analyst. There are (at least) two ways of performing “repeated measures ANOVA” using R but none is really trivial, and each way has it’s own complication/pitfalls (explanation/solution to which I was usually able to find through searching in the R-help mailing list). So for future reference, I am starting this page...

## Cherry Picking to Generalize ~ NASA Global Temperature Trends ~ enhanced w/ ggplot2

April 12, 2010
In a prior article, I tried to visualize the linear global temperatures trends for a grid of start and end years. The visual I created was confusing in that the specification of color scale was interdependent with the data values. I wanted a blue -> white -> red scale of the temperatures indicating cool ->

## Using MKL-Linked R in Eclipse

April 12, 2010
Setting up Eclipse to use MKL-Linked R

In my previous post, I showed how to compile R 2.10.1 using Intel's Math Kernel Library for the BLAS/LAPACK interface. Even though it takes a bit of time to setup, I think the noticeably improved calculation speed justifies the effort. Although I'm happy to use R from the command line for basic stuff,...

## Jeroen Ooms’s ggplot2 web interface – a new version released (V0.2)

April 12, 2010
Good news. Jeroen Ooms released a new version of his (amazing) online ggplot2 web interface: yeroon.net/ggplot2 is a web interface for Hadley Wickham’s R package ggplot2. It is used as a tool for rapid prototyping, exploratory graphical analysis and education of statistics and R. The interface is written completely in javascript, therefore there is no need to install anything on the...

## pgfSweave version 1.0.5 released

April 12, 2010
Version 1.0.5 is now on CRAN. This version brings some bug fixes as well as two new features: Unlabeled code chunks are now allowed. The correct version of PGF is now checked for on startup. If the version is < 2.00, the package will fail to load....

## Arizona court rules statistical sampling is legal

April 12, 2010
A court in Arizona has ruled that statistical sampling is legal for determining damages awarded to individual claimants when there are thousands of similar cases to be assessed simultaneously. In a case where 30,000 claims were filed Maricopa County, AZ by hospitals for improper reimbursement, the trial judge appointed a former judge as a special master in the case...

## Working with themes in Lattice Graphics

April 12, 2010
The Trellis graphics approach provides facilities for creating effective graphs with a consistent look and feel and one of the good things about the system is the use of themes to define the colour, size and other features of the components that make up a graph. The lattice package in R is an implementation of

## Example 7.32: Add reference lines to a plot; fine control of tick marks

April 12, 2010
Sometimes it's useful to plot regular reference lines along with the data. For a time-series plot, this can show when critical values are reached in a clearer way than simple tick marks.As an example, we revisit the empirical CDF plot shown in Example...

## Anecdotal Evidence that Facebook Stores all Clicks?

April 11, 2010
This is not really news. A few months ago, news broke that Facebook recorded each user’s clicks and profile views in a database. Of course, I am not at all surprised. I would be more surprised if they didn’t store every single click.

By now, most people have some sense as to how Facebook’s recommendation system works. It typically performs...

## Significant Figures in R and Info Zeros

April 11, 2010
The other day, I stumbled upon the signif function in R, so I thought I'd take a look at what it does and compare it with some results discussed in Chap. 3 "Damaging Digits in Capacity Calculations" of my GCaP book, viz., Example 3.5 on page 31. The m...

## R frustration of the day

April 11, 2010
Whenever you take a 1 column slice of a matrix, that gets automatically converted into a vector. But if you take a slice of several columns, it remains a matrix. The problem is you don’t always know in advance how big the slice will be, so if you do this: newMatrix

## Historical / Future Volatility Correlation Stability

April 11, 2010
Michael Stokes, author of the MarketSci blog recently published a thought-provoking post about the correlation between historical and future volatility (measured as the standard deviation of daily close price percentage changes). This post is intended...

April 11, 2010
There is a central notion in Time Series Econometrics, cointegration. Loosely it refers to finding the long run equilibrium of two non-stationary series. As the most know non-stationary series examples comes from finance, cointegration is nowadays a tool for traders (not a common one though!). They use it as the theory behind pairs trading (aka

## Summarising data using histograms

April 11, 2010
The histogram is a standard type of graphic used to summarise univariate data where the range of values in the data set is divided into regions and a bar (usually vertical) is plotted in each of these regions with height proportional to the frequency of observations in that region. In some cases the proportion of

## Compiling 64-bit R 2.10.1 with MKL in Linux

April 10, 2010
The rationale for compiling R using the Intel Math Kernel LibraryRecently, there has been a surge in the use of Intel's Math Kernel Library (MKL; http://software.intel.com/en-us/intel-mkl/) among data analysis packages. MKL is a highly optimized set of...

## Where do you sit? Author position and the h-index

April 10, 2010
I was recently introduced to the concept of the h-index and was compelled to find out my own h-index via Scopus.  Numbers don't matter, but discussion with my colleagues turned to the issue of author position.  We quickly decided that there are three important "positions" in the list of authors for a publication: first, last and everywhere else...

## Because it’s Friday: Pixels invade New York

April 9, 2010
Posted for no other reason than it warms my gamer-geek heart to see NYC taken over by 8-bit video game characters. The Tetris sequence is particularly cool. Update: The original video was deleted from YouTube, I'm guessing because of copyright issues with the music. This version has no music. (Thanks to reader MB in the comments for the heads-up.)

## REvolution R Community 3.2 now available

April 9, 2010
REvolution R Community, REvolution's free distribution based on R from the R Project, has been updated to version 3.2 and is now available for download for Windows and MacOS. Some features of this release include: Upgraded R engine. This release is based on R 2.10.1, the latest release (as of this writing). This brings many new features to the...

## Chicago R User Group… It’s for the sexy people!

April 9, 2010
I think we all know that Morris Day was talking about when he wrote the lyrics to “The Bird”: Yes! Hold on now, this dance ain’t for everybody. Just the sexy people. White folks, you’re much too tight. You gotta shake your head like the black folks. You might get some tonight. Look out! That’s right, he was talking about the new

## The Future of Math is Statistics

April 9, 2010
The future of math is statistics… and the language of that future is R: I’ve often thought there was way too little “statistical intuition” in the workplace. I think Author Benjamin would agree.

## Maximum Probability of Profit

April 9, 2010
To continue with the LSPM examples, this post shows how to optimize a Leverage Space Portfolio for the maximum probability of profit. The data and example are again taken from The Leverage Space Trading Model by Ralph Vince. These optimizaitons take ...

## GLMM using DPpackage

April 9, 2010
I was able to fit a semi-parametric Bayesian GLMM model using DPpackage. It took me many hours to sample from the posterior distribution (DPM prior):MCMC scan 1000 of 5000 (CPU time: 18950.080 s)MCMC scan 2000 of 5000 (CPU time: 22510.100 s)M...