# April 2011

### Schelling’s Neighborhood Model

April 30, 2011 | 0 Comments

The New York Times has created a beautiful visualization of the Census Bureau's 2005-2009 American Community Survey data. The distribution of racial and ethnic groups in New York City is particularly fascinating:Chinatown appears in red toward the sou...

### Produce Authentic Math Formulas in R Graphics

April 30, 2011 | 0 Comments

I remember a few weeks ago, there was a challenge in the R-help list to make the prime symbol in R graphics. In LaTeX, we simply write $X'$ or $X^\prime$. R has a rough support for math expressions (see demo(plotmath)) and they are certainly unsatisfactory for LaTeX users. ... [Read more...]

### Filtering for English Tweets: Unsupervised Language Detection on Twitter

April 30, 2011 | 0 Comments

(See a demo here.) While working on a Twitter sentiment analysis project, I ran into the problem of needing to filter out all non-English tweets. (Asking the Twitter API for English-only tweets doesn’t seem to work, as it nonetheless returns tweets in Spanish, Portuguese, Dutch, Russian, and a couple ...

April 30, 2011 | 0 Comments

About a year ago I was reading Godel, Escher, Bach by Douglas Hofstadter. In a section on recursion he presents a sequence that he calls "A Chaotic Sequence" defined as:Q(n) = Q(n - Q(n - 1)) + Q(n - Q(n - 2)) for n __ 2Q(1) = Q(2) =1It's sim... [Read more...]

### Visualizing Terrain Surface Indicies with Scaled Arrows

April 30, 2011 | 0 Comments

Hamish Bowman recently posted a new GRASS module (d.barb) that can be used to depict the direction and magnitude components of some vector (e.g. wind field) along a raster surface or at points in space. An example (c/o Hamish): read more [Read more...]

### Bootstrap Confidence Intervals for Diversity Indices

April 30, 2011 | 0 Comments

Here's the bootstrap refinement of the normal asymptotic interval (Mills and Zandvakili, 1997; Dixon et al., 1987; Efron and Tibshirani, 1997) - where Diversity (div, g) is the Simpson Index calculated from the observed sample, k is the number boot...

### Friday function triple bill: with vs. within vs. transform

April 29, 2011 | 0 Comments

When you first learnt about data frames in R, I’m sure that, like me, you thought “This is a lot of hassle having to type the names of data frames over and over in order to access each column”. library(MASS) anorexia\$wtDiff [Read more...]

### Another Use of LSPM in Tactical Portfolio Allocation

April 29, 2011 | 0 Comments

After the slightly unconventional use of LSPM presented in Slightly Different Use of Ralph Vince’s Leverage Space Trading Model, I thought I should follow up with something that more closely resembles my interpretation of Ralph Vince’s book. LSPM s...

### Rcpp Workshop slides

April 29, 2011 | 0 Comments

Dirk and I gave a full day Rcpp workshop yesterday in Chicago before the R in Finance conference. The pdfs of the slides are available here: part 1 (intro), part 2 (details), part 3 (modules) and part 4 (applications) [Read more...]

### Parallelizing and cross-validating feature selection in R

April 29, 2011 | 0 Comments

This is an example piece of code for the Overfitting competition at kaggle.com. This method has an AUC score of ~.91, which is currently good enough for about 38th place on the leaderboard. If you read the completion forums closely, you will find code...

### Gartner: Revolution Analytics a "Cool Vendor" for BI

April 29, 2011 | 0 Comments

Leading analyst firm Gartner has just published its "Cool Vendors in Analytics and Business Intelligence" report for 2011 (download it here if you have a Gartner subscription). In the report, Revolution Analytics is named a Gartner Cool Vendor, and recognizes the company as "innovative, impactful and intriguing": Driven in part by ... [Read more...]

### RStudio is good for you

April 29, 2011 | 0 Comments

I was recently introduced to RStudio, a new integrated development environment for R, it is just amazing! It is free, and open, compatible with PC/Mac/Linux OSs. You can also choose to run it in the cloud, and access it from your favorite web browser. As you can see, ... [Read more...]

### Example 8.36: Quadratic equation with real roots

April 29, 2011 | 0 Comments

We often simulate data in SAS or R to confirm analytical results. For example, consider the following problem from the excellent text by Rice:Let U1, U2, and U3 be independent random variables uniform on [0, 1]. What is the probability that the roots...

### Slides from Rcpp workshop / master class yesterday

April 29, 2011 | 0 Comments

Romain and I just posted our slides from yesterday's Rcpp workshop and class (preceding the now-ongoing R/Finance conference). You can access the slides via my presentation page, or directly from here as Part 1 (Introduction), Part 2 (Details), Part ... [Read more...]

### Forming Formulas

April 29, 2011 | 0 Comments

One of the first functions a new R user learns how to use is the lm() command, which involves stating the model formula.lm(y~x1+x2, data=mydata)After a while, this just becomes a natural way to say "I want a regression of y on x1 and x2 ...

### ggplot2 – First impressions

April 29, 2011 | 0 Comments

I was reading various R blogs and saw very nice looking plots created with ggplot2 package. Especially this blog was useful because of link to a very interesting book about ggplot2. I want to display and update the latest co-integrated pairs every day ... [Read more...]

### Easy way to get yield curve : what you need is only "FRBData" package !

April 28, 2011 | 0 Comments

I made FRBData package and registerd it on CRAN.This package allow you to download financial data from FRB's website.This website provide many economical data such as consumer credit, money stock.This article show you how to use this package.(But, it has only a function about interest rate ... [Read more...]

### Processing nested lists

April 28, 2011 | 0 Comments

So perhaps you have all figured this out already, but I was excited to figure out how to finally neatly get all the data frames, lists, vectors, etc. out of a nested list. It is as easy as nesting calls to the apply family of functions, in the case bel...

### Slightly Different Use of Ralph Vince’s Leverage Space Trading Model

April 28, 2011 | 0 Comments

In honor of the press release Dow Jones Indexes To Develop, Co-Brand Index Family With LSP Partners two days ago, I thought I would show another slightly different use of Ralph Vince’s The Leverage Space Trading Model. Using the R LSPM package, we c...