# Blog Archives

## Analyzing Federal Bailout Recipients in R

January 19, 2012
By

I was searching for open data recently, and stumbled on Socrata. Socrata has a lot of interesting data sets, and while I was browsing around, I found a data set on federal bailout recipients. Here is the data set. However, data sets on Socrata are not always the most recent versions, so I followed a link to...

## Intro to Ensemble Learning in R

January 19, 2012
By

Introduction This post incorporates parts of yesterday's post about bagging. If you are unfamiliar with bagging, I suggest that you read it before continuing with this article. I would like to give a basic overview of ensemble learning. Ensemble learning involves combining multiple predictions derived by different techniques in order to create a stronger overall prediction. For example,...

## Improve Predictive Performance in R with Bagging

January 18, 2012
By

Bagging, aka bootstrap aggregation, is a relatively simple way to increase the power of a predictive statistical model by taking multiple random samples(with replacement) from your training data set, and using each of these samples to construct a separate model and separate predictions for your test set. These predictions are then averaged to create a, hopefully more accurate,...

## Time Based Arbitrage Opportunities in Tick Data

January 17, 2012
By

I recently posted an introduction to the Kaggle Algorithmic Trading Challenge, which I competed in.I said that I would post about my experiences, and this is hopefully the first of a series. We were given tick data from the London Stock Exchange(speci...

## Parallel R Loops for Windows and Linux

January 17, 2012
By

Parallel computation may seem difficult to implement and a pain to use, but it is actually quite simple to use. The foreach package provides the basic loop structure, which can utilize various parallel backends to execute the loop in parallel. First,...

## Parallel R Loops in Windows and Linux

January 17, 2012
By

Parallel computation may seem difficult to implement and a pain to use, but it is actually quite simple to use. The foreach package provides the basic loop structure, which can utilize various parallel backends to execute the loop in parallel. First, let's go over the basic structure of a foreach loop. To get the foreach package, run the following...

## Time Series Cointegration in R

January 10, 2012
By

Cointegration can be a valuable tool in determining the mean reverting properties of 2 time series. A full description of cointegration can be found on Wikipedia. Essentially, it seeks to find stationary linear combinations of the two vectors. The below R code, which has been modified from here, will test two series for integration and...

## Introduction to Kaggle Algorithmic Trading Challenge

January 10, 2012
By

I recently participated in the Kaggle Algorithmic Trading Competition under the username VikP. For those who do not know what Kaggle is, it is a web site where individuals and corporations can host data analysis competitions. This particular competit...

## Using R in Ruby

January 10, 2012
By

Integrating R into more traditional programming languages can be incredibly rewarding due to R's powerful built-in statistical tools, but it can also be extremely frustrating at times. Thankfully, like much else to do with Ruby, integrating R and Ruby...