## System in 10 Minutes After Twitter

October 13, 2011
By

On Twitter last night, I spotted @milktrader from www.algorithmzoo.com doing some range research on equity indexes.  I offered a tweet on the crazy Russell 2000 17% move over 7 days.  Within 10 minutes, I discovered a signal that worked very ...

## Maximum likelihood

October 13, 2011
By
$Maximum likelihood$

This post is one of those ‘explain to myself how things work’ documents, which are not necessarily completely correct but are close enough to facilitate understanding. Background Let’s assume that we are working with a fairly simple linear model, where … Continue reading →

## There’s a lot to like about R

October 13, 2011
By

I once heard John Chambers (the inventor of the S language, and member of the R Core Group) say, "Show me a programming language no-one complains about, and I'll show you a language no-one uses". The R language has its fair share of complainants, to be sure -- and that's to be expected for a language with more than...

## Waiting in line, waiting on R

October 13, 2011
By

I should state right away that I know almost nothing about queuing theory. That’s one of the reasons I wanted to do some queuing simulations. Another reason: when I’m waiting in line at the bank, I tend to do mental calculations for how long it should take me to get served. I look at the

## Example 9.9: Simplifying R using the mosaic package (part 1)

October 13, 2011
By

While both SAS and R are powerful systems for statistical analysis, they can be frustrating to new users or those learning statistics for the first time. RThe mosaic package is designed to help simplify the interface for such new users, while allowing ...

## Phylogenetic community structure: PGLMMs

October 13, 2011
By

So, I've blogged about this topic before, way back on 5 Jan this year.Matt Helmus, a postdoc in the Wootton lab at the University of Chicago, published a paper with Anthony Ives in Ecological Monographs this year (abstract here).  The paper addres...

## Modelling with R: part 4

October 13, 2011
By

In part 3, we ran a logistic model to determine the probability of default of a customer. We also saw the outputs and tried to judge the performance of the model b plotting the ROC curve. Let's try a different approach today. How about a decision tree?...

## Introduction to Asset Allocation

October 12, 2011
By

This is the first post in the series about Asset Allocation, Risk Measures, and Portfolio Construction. I will use simple and naive historical input assumptions for illustration purposes across all posts. In these series I plan to discuss: Maximum Loss, MAD, CVaR, CDaR, Omega Risk Measures 130:30 Long/Short portfolios and Cardinality Constraints Arithmetic and Geometric

## S&P 500 components heatmap in R

October 12, 2011
By

In this article, Hans Gilde exposes the clever use of a heatmap hidden in the Bioconductor library. In his example, he describes a way to show different ‘observations’ on subjects, with the concept of time. Financial indices, like the S&P 500 or the Dow Jones indices, are mathematically some kind of measure of overall market

## A true data-doodler – Christophe Ladroue (R ddly and plyr on Triathlon Results)

October 12, 2011
By

To me, this post by Christophe Ladroue personifies what data doodlers do.They take a dataset that is of interest to them (In his case, his triathlon results) and then they manipulate the numbers to see what insights can be drawn. Most bloggers only sho...

## Typos in Introduction to Monte Carlo Methods with R

October 12, 2011
By

The two translators of our book in Japanese, Kazue & Motohiro Ishida, contacted me about some R code mistakes in the book. The translation is nearly done and they checked every piece of code in the book, an endeavour for which I am very grateful! Here are the two issues they have noticed (after incorporating

## Bay Area R Users group has 1300 members

October 12, 2011
By

Impressive. You are not alone!

## Percentage of Organic Farming Operations by State

October 12, 2011
By

With data from the USDA on certified organic farms for 2008.  I created a map using the Geo Map function from the googleVis API package available in R.  I’ve copied and pasted the image below as WordPress.com sites don’t support … Continue reading →

## Slides and replay for "Introduction to R for SAS and SPSS users"

October 12, 2011
By

If you missed last week's webinar from Bob Muenchen, "Introduction to R for SAS and SPSS users", you missed a great overview of the R Project and how it compares to commercial statistical software. Bob's slides are below, and you can download the slides and replay from the Revolution Analytics website. Bob pointed out a couple of really useful...

## What does it mean to be a Data Scientist?

October 12, 2011
By

Check out this talk by John Rauser of AMZN at the 2011 Strata Conf. It is an excellent intro to the field.

## Multiply Imputing an Outcome Variable

October 12, 2011
By

Some scholars suggest that multiply imputing an outcome variable is incorrect. I use intuition and simulation to argue that multiply imputing outcomes can drastically improve estimates, even in the case of non-ignorable missingness. Continue reading &#...

## Simulating data following a given covariance structure

October 12, 2011
By

Every year there is at least a couple of occasions when I have to simulate multivariate data that follow a given covariance matrix. For example, let’s say that we want to create an example of the effect of collinearity when … Continue reading →

## Generosity of Asian Central Banks

October 12, 2011
By

The only thing that separates the United States from Europe and the notorious PIIGS is the generosity of Asian Central Banks who have been consistently quantitatively easing since 1998 (Join the Reserves). From TimelyPortfolio Without this generos...

## Tricks I learned today #1: as.integer() on factor levels

October 12, 2011
By

I normally work with full numerical data, not categorical data. R, when using read.csv() seems to recognize such categories and marks the column as to have factor levels. This is useful indeed. However, I wanted to make a PCA biplot on this data, so wa...

## What’s there to like about R?

October 12, 2011
By

Update 10/11/2011: There’s a good discussion on RedditUpdate 10/12/2011: Note manipulate package and highlight data.table packageThe R statistical computing platform is a rising star that’s been gaining popularity and attention, but it gets no respect in the hood. It’s telling that a popular guide to R is called The R Inferno, and that advocacy pieces Follow me on...

## Why it doesn’t make sense to chew people out for not reading the help page

October 12, 2011
By

Karl Broman writes: Barry Rowlingson gave an interesting talk at UseR 2011, “Why R-help must die!” He suggested the Q-and-A type sites Stack Overflow (on programming) and Cross Validated (on statistics), both part of Stack Exchange. I haven’t used R-help recently but I do occasionally send people there. Just to see what was going on The post Why...

## Identifying Records in Data Frame A That Are Not Contained In Data Frame B – A Comparison

October 12, 2011
By

Yesterday I launched my first question at Stackoverflow and apparently did a lot of things wrong as I managed to get my question closed wihtin hours http://stackoverflow.com/questions/7728462/identify-records-in-data-frame-a-not-contained-in-data-frame-b I had collected 9 different solutions to the problem and made the mistake to put it all within the original question space. So people complained and told me … Continue reading...

## Yet Another One.. Animation with saveHTML / saveVideo from Package ANIMATION

October 12, 2011
By

...some more playing with saveHTML, as.raster() and rasterImage(), producing a "flickering screen":Read more »

## Online

October 12, 2011
By

Hello world, I decided to start blogging a bit to throw my weird R code examples at you ;-) Hope you’ll like it! Greetz, Janko

## R related books: Traditional vs online publishing

October 12, 2011
By

How many R related books have been published so far? Who is the most popular publisher? How many other manuals, tutorials and books have been published online? Let's find out. A few years ago I used the publication list on r-project.org as an argument ...

## Model decision tree in R, score in Base SAS

October 11, 2011
By

This code creates a decision tree model in R using party::ctree() and prepares the model for export it from R to Base SAS, so SAS can score new records. SAS Enterprise Miner and PMML are not required, and Base SAS … Continue reading →

## Le Monde puzzle [#743]

October 11, 2011
By

As Le Monde weekend has yet again changed its format (with so much more advertisements for luxurious items that I sometimes wonder whether or not this is the weekend edition of Le Monde!], it took me a while to locate the mathematical puzzle. The good news is there now is a science&techno leaflet with, at

## Where to find data to use with R

October 11, 2011
By

(Contributing blogger Joe Rickert has put together a fantastic list of data sources suitable for use with R. If you're looking for data to use in the Applications of R Contest -- entries close October 31 -- this is a great resource for you -- Ed.) Hardly a day goes by without someone or something reminding me that we...