Blog Archives

Loading and/or Installing Packages Programmatically

May 8, 2012
By

In R, the traditional way to load packages can sometimes lead to situations where several lines of code need to be written just to load packages. These lines can cause errors if the packages are not installed, and can also be hard to maintain, particularly during deployment.

Fortunately, there is a way to create a function in R...

Read more »

Monitoring Progress Inside a Foreach Loop

February 9, 2012
By

The foreach package for R is excellent, and allows for code to easily be run in parallel. One problem with foreach is that it creates new RScript instances for each iteration of the loop, which prevents status messages from being logged to the console output. This is particularly frustrating during long-running tasks, when we are often unsure...

Read more »

Using LaTeX, R, and Sweave to Create Reports in Windows

January 30, 2012
By
Using LaTeX, R, and Sweave to Create Reports in Windows

LaTeX is a typesetting system that can easily be used to create reports and scientific articles, and has excellent formatting options for displaying code and mathematical formulas. Sweave is a package in base R that can execute R code embedded in LaTe...

Read more »

Parallel R Model Prediction Building and Analytics

January 26, 2012
By

Modifying R code to run in parallel can lead to huge performance gains. Although a significant amount of code can easily be run in parallel, there are some learning techniques, such as the Support Vector Machine, that cannot be easily parallelized. However, there is an often overlooked way to speed up these and other models. It...

Read more »

Analyzing US Government Contract Awards in R

January 23, 2012
By
Analyzing US Government Contract Awards in R

As I was exploring open data sources, I came across USA spending. This site contains information on US government contract awards and other disbursements, such as grants and loans. In this post, we will look at data on contracts awarded in the state of Maryland in the fiscal year 2011, which is available by selecting "Maryland"...

Read more »

R Regression Diagnostics Part 1

January 20, 2012
By
R Regression Diagnostics Part 1

Linear regression can be a fast and powerful tool to model complex phenomena. However, it makes several assumptions about your data, and quickly breaks down when these assumptions, such as the assumption that a linear relationship exists between the predictors and the dependent variable, break down. In this post, I will introduce some diagnostics that you can...

Read more »

Analyzing Federal Government Bailout Recipients in R

January 19, 2012
By
Analyzing Federal Government Bailout Recipients in R

I was searching for open data recently, and stumbled on Socrata. Socrata has a lot of interesting data sets, and while I was browsing around, I found a data set on federal bailout recipients. Here is the data set. However, data sets on Socrata are not always the most recent versions, so I followed a...

Read more »

An Intro to Ensemble Learning in R

January 19, 2012
By

Introduction

This post incorporates parts of yesterday's post about bagging. If you are unfamiliar with bagging, I suggest that you read it before continuing with this article.

I would like to give a basic overview of ensemble learning. Ensemble learning involves combining multiple predictions derived by different techniques in order to create a stronger overall prediction....

Read more »

Improve Predictive Performance in R with Bagging

January 18, 2012
By

Bagging, aka bootstrap aggregation, is a relatively simple way to increase the power of a predictive statistical model by taking multiple random samples(with replacement) from your training data set, and using each of these samples to construct a separate model and separate predictions for your test set. These predictions are then averaged to create a, hopefully more accurate,...

Read more »

Time Based Arbitrage Opportunities in Tick Data

January 17, 2012
By
Time Based Arbitrage Opportunities in Tick Data

I recently posted an introduction to the Kaggle Algorithmic Trading Challenge, which I competed in.I said that I would post about my experiences, and this is hopefully the first of a series. We were given tick data from the London Stock Exchange(speci...

Read more »