## More on Factor Attribution to improve performance of the 1-Month Reversal Strategy

July 26, 2012
In my last post, Factor Attribution to improve performance of the 1-Month Reversal Strategy, I discussed how Factor Attribution can be used to boost performance of the 1-Month Reversal Strategy. Today I want to dig a little dipper and examine this strategy for each sector and also run a sector-neutral back-test. The initial steps to

## Linear regression by gradient descent

July 26, 2012
In Andrew Ng's Machine Learning class, the first section demonstrates gradient descent by using it on a familiar problem, that of fitting a linear function to data. Let's start off, by generating some bogus data with known characteristics. Let's make y just a noisy version of x. Let's also add 3 to give the intercept term something to...

## Big vectors coming to R

July 26, 2012
R has been available as a 64-bit application since it's earliest days. But the internal representation of R's fundamental data type — the vector — has long been subject to a 32-bit limitation: the maximum number of elements is capped at 2^31 (or just over 2.1 billion) elements. Now, at 8 bytes per element that's 16Gb of data, so...

## Changing function scope in GNU R example

July 26, 2012
In my last post I have discussed how to work around GNU R scoping rules using environment function. This time let us look at a practical example using recode function from car package.First let us look at how&nbs...

## Monitor: Using category labels

July 26, 2012
I´ve been checking recently the performance of a calibration of compound feed with  a set of samples (15): 3 samples of hen feed, 3 of pig feed, 3 of chicken feed, 3 of ovine feed and 3 of cattle feed.The idea is to check if the calibration predi...

## Plotting 95% Confidence Bands in R

July 26, 2012
I am comparing estimates from subject-specific GLMMs and population-average GEE models as part of a publication I am working on. As part of this, I want to visualize predictions of each type of model including 95% confidence bands. First I had to ma...

## R Inferno-ism: order is not rank

July 26, 2012
Do not use order when you want rank. Background The update of “A comparison of some heuristic optimization methods” is due to the bug that Luca Scrucca spotted. Actually, it is two bugs: I used order when I meant rank This somehow escaped being in The R Inferno   Problem What I said in my … Continue reading...

## Universal portfolio, part 9

July 25, 2012
Part 8 was discussing the distribution of the absolute wealth of the Universal Portfolio across all possible tuples of length 2, 3 and 4.However, comparing the absolute wealth against some reference, especially against simple portfolio selection algor...

## Getting rasters into shape from R

July 25, 2012
Today I needed to convert a raster to a polygon shapefile for further processing and plotting in R. I like to keep my code together so I can easily keep track of what I’ve done, so it made sense to … Continue reading →

## Another R mention in the NYT

July 25, 2012
The R language gets a brief mention in an article in yesterday's New York Times on automated bond trading: The traders here are mostly educated in math or physics, often outside the United States, and their desks are piled high with textbooks like the “R Graphs Cookbook,” for working with obscure computer programming languages. R an obscure programming language?...

## Hierarchical Cluster Analysis (ChemoSpec) – 03

July 25, 2012
It is clear that we can discriminate between olive oil and sunflower oil, but let´s see the reason for the sub-clusters in the sunflower oil.Samples sflw6da, sflw7da, sflw8da, sflw9da, sflw10da are refined sunflower, so it is filtered and processed, t...

## Heaviside Signal Detection Part 1: Informed non-parametric testing

July 25, 2012
Steps may be frequently found in geophysical datasets, specifically timeseries (e.g. GPS).  A common approach to estimating the size of the offset is to assume (or estimate) what the statistical structure of the noise is and estimate the size and … Continue reading →

## Inspirational Stack Overflow Dendrogram Applied to Currencies

July 25, 2012
When I saw the answer to this Stack Overflow question, I immediately remembered working on my old post Clustering with Currencies and Fidelity Funds and just had to try to apply this technique.  As I should have guessed, it worked with only a mini...

July 25, 2012
The most recent edition of the Revolution Newsletter is out. The news section is below, and you can read the full July edition (with highlights from this blog and community events) online. You can subscribe to the Revolution Newsletter to get it monthly via email. Quick Start Program for Hadoop: Revolution Analytics makes it easy for data analysts and...

## Long-vector kludge in R

July 25, 2012
Just recently, I found out that R is limited to 32-bit integers, even on 64-bit hardware. Bummer, huh? As a consequence, the maximum size of a vector is 2^31-1. To be fair, dealing with numeric types across machine architectures is hard. A fixed repr...

## The housing bubble: Where are we?

July 25, 2012
Last spring we looked at the state of the housing bubble in the US. The question of readers' minds then was "where is it going next"? It's been more than a year, so let's have a look, above. The post The housing bubble: Where are we? appeared first on Decision Science News.

## Learning R has really made me appreciate SAS

July 25, 2012
For the past 18 months, it seems like all I’ve heard about in the digital marketing industry is “big data”, and with that, mentions of using Hadoop and R to solve these sorts of problems.  Why are these tools the … Continue reading →Learning R has really made me appreciate SAS is an article from randyzwitch.com,...

## Really Big Objects Coming to R

July 25, 2012
I noticed in the development version of R the following note in the NEWS file: There is a subtle change in behaviour for numeric index values 2^31 and larger.  These used never to be legitimate and so were treated as NA, sometimes with a warning.

## Measuring persistence in a time series : Application of rolling window regression

During my final semester at IGIDR I did a project paper in macroeconomics involving timeseries econometrics. The concept that I focused on my study was unit root, which I have touched upon in my earlier posts. This study presents a novel...

## Displaying time series, spatial and space-time data with R

During next months I will be working on the book “Displaying time series, spatial and space-time data with R: stories …Continuar leyendo »

## RcppClassic 0.9.2

July 24, 2012
Similar to yesterday's post about RcppGSL, we have another pure maintenance release to announce, this time of RcppClassic, the package supporting the deprecated older classic Rcpp API defined in the earlier 2005 to 2006 releases, is now on CRAN. Ther...

## Civic Data Challenge closes July 29

July 24, 2012
There's only a few days left to enter the Civic Data Challenge: entries are due before midnight EST on July 29 to qualify for the \$100,000 in prizes. The competition, open to US residents, challenges particpants to applications and visualizations from civic health data. Prizes will be awarded by a panel of prestigious judges. Looks like a great opportunity...

## How to tell when error bars correspond to a significant p-value

July 24, 2012
Can you tell when error bars based on 95 % CIs or standard errors correspond to a significant p-value? Don’t fret if you think it’s hard, a study from 2005 showed that researchers in psychogoly, behavior neuroscience and medicine had a hard time judging when error bars from two independent groups signified a significant difference.

## get UCSC images for a list of regions in batch

July 24, 2012
Here is my working R code for the task. It can be simplified as 3 lines.# example of controling individual track#theURL="http://genome.ucsc.edu/cgi-bin/hgTracks?db=mm9&wgRna=hide&cpgIslandExt=pack&ensGene=hide&mrna=hide&intronEst=hi...

## The Failure of Asset Allocation – Bonds Are An Imperfect Hedge

July 24, 2012
US investors were spoiled by US Treasuries which acted as a near perfect hedge to stocks during the 2008-2009 crisis.  However, in real crisis, bonds rarely offer any comfort, and asset allocation fails (see post Death Spiral of a Country and IMF ...

## What’s wrong with LOESS for palaeo data?

July 24, 2012
$What’s wrong with LOESS for palaeo data?$

Locally weighted scatterplot smoothing (LOWESS) or local regression (LOESS) is widely used to highlight “signal” in variables from stratigraphic sequences. It is a user-friendly way of fitting a local model that derives its form from the data themselves rather than having … Continue reading →

## Williams designs with 5 products

July 24, 2012
In a previous post I created small Williams designs for an even number of products. This worked very well, also because the number of permutations could be restricted significantly due to symmetry. Unfortunately this does not work so well with an odd n...

## What’s wrong with LOESS for palaeo data?

July 24, 2012
Locally weighted scatterplot smoothing (LOWESS) or local regression (LOESS) is widely used to highlight “signal” in variables from stratigraphic sequences. It is a user-friendly way of fitting a local model that derives its form from the data themselves rather than having to be specified a priori by the user. There are generally two things that a user has...

## RcppGSL 0.2.0

July 23, 2012
Earlier today, a minor update / maintenance release of RcppGSL---our interface package between R and the GNU GSL using our Rcpp package for seamless R and C++ integration---arrived on on CRAN. It contains a number of minor changes to accomodate chan...