Using R — Standalone Scripts & Error Messages

December 5, 2011
By

Open-source R is an amazing tool for statistical analysis and data visualization. Serious R gurus have found ways to do just about anything entirely within the R environment. Nevertheless, there are many of us who wish to plug R into …  ...

December 5, 2011
By

Oregon State University makes a set of ocean productivity data derived from satellite data available for download and use by researchers. The Ocean Productivity website explains the available data and how it was derived. I have put together a few R fun...

PCA file calculation with "R".

December 5, 2011
By

X es la matriz centrada (X is the centered matrix). Xcov es la matriz de covarianzas de X (Xcov is the covariance matrix of X).Con la función "eigen" calculamos los "eigenvectors" y "eigenvalues" de Xcov.(With the function "eigen" we calculate the "ei...

Decimal to Binary in "R"

December 5, 2011
By

Lately...I've been learning "R"...that weird programming language aimed for Statistics and Statistical programming...and I really like it...so as usual, I needed to create my own Decimal to Binary application -;)binary bsum bexp while (p_number > 0) { digit p_number bsum ...

Vote Compass: visualizing Canadian poll results with R

December 5, 2011
By

Vote Compass is an online "electoral literacy application, whose goal is to encourage engagement with and stimulate discussion around the policy platforms of Canada's political parties. In the lead-up to the 2011 Canadian election, Vote Compass collected the results of an on-line 10-minute survey from more than 2 million participants, and used the results to align voters with the...

International Open Data Hackathon

December 5, 2011
By

This past Saturday, I hung out at the Seattle branch of the International Open Data Hackathon. The event was hosted at the Pioneer Square office of Socrata, a small company that helps governments provide public open data. A pair of data analysts from Tableau were showing off a visualization for the Washington...

A pure R poker hand evaluator

December 5, 2011
By

There's already a lot of great posts out there about poker hand evaluators, so I'll keep this short.  Kenneth J. Shackleton recently released a very slick 5-card and 7-card poker hand evaluator called SpecialK.  This evaluator is li...

From datasets to algorithms in R

December 5, 2011
By

Many statistical algorithms are taught and implemented in terms of linear algebra. Statistical packages often borrow heavily from optimized linear algebra libraries such as LINPACK, LAPACK, or BLAS. When implementing these algorithms in systems such as Octave or MATLAB, it is up to you to translate the data from the use case terms (factors, categories, numerical variables)...

R-bloggers

December 5, 2011
By

For a long time, I have relied on R-bloggers for new, interesting, arcane, and all around useful information related to R and statistics. Now my R-related material is appearing there. If you use the R package at all, R-bloggers should be in your feed a...

The Art of R Programming – my two cents

December 5, 2011
By

What makes this book different from other books about R is stated clearly by the author Norman Matloff in the introduction: "This book is not a compendium of the myriad types of statistical methods that are available in the wonderful R package. It r...

The volatility mystery continues

December 5, 2011
By

How do volatility estimates based on monthly versus daily returns differ? Previously The post “The mystery of volatility estimates from daily versus monthly returns” and its offspring “Another look at autocorrelation in the S&P 500″ discussed what appears to be an anomaly in the estimation of volatility from daily versus monthly data. In recent times … Continue reading...

I may have been hasty…

December 4, 2011
By

I think one of the real reasons that I haven't liked R is the default interface blows (sucks, whatever).  I just discovered the Eclipse plugin StatET.  This things rules.  Contextual help, completion, object browser, data browser, etc. &...

Steve Jobs’ 2005 Stanford Commencement Address

December 4, 2011
By

Given that there are almost 13 million views of Steve Jobs’ commencement address, I am certain that I missed this video when it went viral. I am glad that I did not see it until now because I may not have appreciated his words of wisdom. And although...

Improved Moving Average?

December 4, 2011
By

When @quantfblog started following me on Twitter, I was delighted to discover their papers Papailias, Fotis and Thomakos, Dimitrios D., An Improved Moving Average Technical Trading Rule (September 11, 2011). Available at SSRN: http://ssrn.com/abstract...

Introducing Biostatistics to first year LCG students

December 4, 2011
By

Around two weeks ago I gave a talk via skype to the first year students from the Undergraduate Program on Genomic Sciences (LCG in Spanish) from the National Autonomous University of Mexico (UNAM in Spanish). The talk was under the context of the Introduction to Bioinformatics Seminar Series whose goal is to familiarize the new students with the bioinformatics...

Non-PD Matrices in R, Cont.

December 3, 2011
By

Let me preface this post by saying I am getting frustrated with R.  The syntax is not intuitive and the performance for matrix operations is slow.  Using Octave, a free Matlab clone, I can get over 6 Gflops on things that R is doing at less than 2.  After this post, I will focus on the statistical functions of R...

Visualizing Unemployment Data

December 3, 2011
By

So recently Bureau of Labor Statistics released the Oct. 2011 unemployment data. This is not a discussion of it’s validity nor it’s impact, but it is a post on how to visualize it. This post is also for my posterity, I’ve wanted to be able to do this for a while, and it’ll serve as

On the (statistical) road, workshops and R

December 3, 2011
By

Things have been a bit quiet at Quantum Forest during the last ten days. Last Monday (Sunday for most readers) I flew to Australia to attend a couple of one-day workshops; one on spatial analysis (in Sydney) and another one … Continue reading →

Comparing model selection methods

December 2, 2011
By

The standard textbook analysis of different model selection methods, like cross-validation or validation sample, focus on their ability to estimate in-sample, conditional or expected test error. However, the other interesting question is to compare the...

O’Reilly’s Data Science Kit – Books

December 2, 2011
By

It is not as if I don't have enough books (and material on the web) to read. But this list compiled by the O'Reilly team should make any data analyst salivate.http://shop.oreilly.com/category/deals/data-science-kit.doThe Books and Video included in the...

Easy cell statistics for factorial designs

December 2, 2011
By

A common task when analyzing multi-group designs is obtaining descriptive statistics for various cells and cell combinations. There are many functions that can help you accomplish this, including aggregate() and by() in the base installation, summaryBy() in the doBy package, and … Continue reading →

December 2, 2011
By

Wasting away again in Martingaleville

December 1, 2011
By

Alright, I better start with an apology for the title of this post. I know, it’s really bad. But let’s get on to the good stuff, or, perhaps more accurately, the really frightening stuff. The plot shown at the top of this post is a simulation of the martingale betting strategy. You’ll find code for

Backtesting with Short positions

December 1, 2011
By

I want to illustrate Backtesting with Short positions using an interesting strategy introduced by Woodshedder in the Simple, Long-Term Indicator Near to Giving Short Signal post. This strategy was also analyzed in details by MarketSci in Woodshedder’s Long-Term Indicator post. The strategy uses the 5 day rate of change (ROC5) and the 252 day rate

Interviews on Revolution R Enterprise 5.0

December 1, 2011
By

For those looking for more background behind the updates in Revolution R Enterprise 5.0, there are now a couple of interviews online where I talk about the new release. At IT Business Edge ("Revolution Analytics' Goal: Make R Analysis Enterprise-Friendly"), I had a chat with Loraine Lawson about how Revolution R Enterprise fits within the analytics stack, its big-data...

A Friday round-up

December 1, 2011
By

Just a brief selection of items that caught my eye this week. Note that this is a Friday as opposed to Friday, lest you mistake this for a new, regular feature. 1. R/statistics ggbio A new Bioconductor package which builds on the excellent ggplot graphics library, for the visualization of biological data. R development master

C++ is dead. Long live C++

December 1, 2011
By

During the summer I was contacted by a hedge fund from Bahamas. The fund was looking for someone with R language skills on-site and insisted for phone interview. Besides obvious questions about finance, statistics, coding and how many tennis balls can fit in Boeing 747 (ok, this question was omitted), they wanted to know if