Articles by Vik Paruchuri

Predicting the NBA Finals with R

May 30, 2012 | Vik Paruchuri

This is the initial post about the algorithm. See updates 1, 2, and 3 for more. The algorithm is currently 4-2 in the playoffs!OverviewI was struck by Martin O'Leary's recent post on predicting the Eurovision finals, which led me to decide that I wou... [Read more...]

Mapping US Radiation Levels in R

May 8, 2012 | Vik Paruchuri

I have posted previously about the open data available on Socrata (https://opendata.socrata.com/), and I was looking at the site again today when I stumbled upon a listing of levels of various radioactive isotopes by US city and state. The data is available at https://opendata.socrata.com/... [Read more...]

Loading and/or Installing Packages Programmatically

May 8, 2012 | Vik Paruchuri

In R, the traditional way to load packages can sometimes lead to situations where several lines of code need to be written just to load packages. These lines can cause errors if the packages are not installed, and can also be hard to maintain, particularly during deployment. Fortunately, there is ... [Read more...]

Monitoring Progress Inside a Foreach Loop

February 9, 2012 | Vik Paruchuri

The foreach package for R is excellent, and allows for code to easily be run in parallel. One problem with foreach is that it creates new RScript instances for each iteration of the loop, which prevents status messages from being logged to the console output. This is particularly frustrating during ... [Read more...]

Parallel R Model Prediction Building and Analytics

January 26, 2012 | Vik Paruchuri

Modifying R code to run in parallel can lead to huge performance gains. Although a significant amount of code can easily be run in parallel, there are some learning techniques, such as the Support Vector Machine, that cannot be easily parallelized. However, there is an often overlooked way to speed ... [Read more...]

Analyzing US Government Contract Awards in R

January 23, 2012 | Vik Paruchuri

As I was exploring open data sources, I came across USA spending. This site contains information on US government contract awards and other disbursements, such as grants and loans. In this post, we will look at data on contracts awarded in the state of Maryland in the fiscal year 2011, which ... [Read more...]

R Regression Diagnostics Part 1

January 20, 2012 | Vik Paruchuri

Linear regression can be a fast and powerful tool to model complex phenomena. However, it makes several assumptions about your data, and quickly breaks down when these assumptions, such as the assumption that a linear relationship exists between the predictors and the dependent variable, break down. In this post, I ... [Read more...]

An Intro to Ensemble Learning in R

January 19, 2012 | Vik Paruchuri

Introduction This post incorporates parts of yesterday's post about bagging. If you are unfamiliar with bagging, I suggest that you read it before continuing with this article. I would like to give a basic overview of ensemble learning. Ensemble learning involves combining multiple predictions derived by different techniques in order ... [Read more...]

Analyzing Federal Bailout Recipients in R

January 19, 2012 | Vik Paruchuri

I was searching for open data recently, and stumbled on Socrata. Socrata has a lot of interesting data sets, and while I was browsing around, I found a data set on federal bailout recipients. Here is the data set. However, data sets on Socrata are not always the most recent ...
[Read more...]

Intro to Ensemble Learning in R

January 19, 2012 | Vik Paruchuri

Introduction This post incorporates parts of yesterday's post about bagging. If you are unfamiliar with bagging, I suggest that you read it before continuing with this article. I would like to give a basic overview of ensemble learning. Ensemble learning involves combining multiple predictions derived by different techniques in order ... [Read more...]

Improve Predictive Performance in R with Bagging

January 18, 2012 | Vik Paruchuri

Bagging, aka bootstrap aggregation, is a relatively simple way to increase the power of a predictive statistical model by taking multiple random samples(with replacement) from your training data set, and using each of these samples to construct a separate model and separate predictions for your test set. These predictions ... [Read more...]

Parallel R Loops for Windows and Linux

January 17, 2012 | Vik Paruchuri

Parallel computation may seem difficult to implement and a pain to use, but it is actually quite simple to use. The foreach package provides the basic loop structure, which can utilize various parallel backends to execute the loop in parallel. First,... [Read more...]

Parallel R Loops in Windows and Linux

January 17, 2012 | Vik Paruchuri

Parallel computation may seem difficult to implement and a pain to use, but it is actually quite simple to use. The foreach package provides the basic loop structure, which can utilize various parallel backends to execute the loop in parallel. First, let's go over the basic structure of a foreach ... [Read more...]

Time Series Cointegration in R

January 10, 2012 | Vik Paruchuri

Cointegration can be a valuable tool in determining the mean reverting properties of 2 time series. A full description of cointegration can be found on Wikipedia. Essentially, it seeks to find stationary linear combinations of the two vectors. The below R code, which has been modified from here, will test two ... [Read more...]

Using R in Ruby

January 10, 2012 | Vik Paruchuri

Integrating R into more traditional programming languages can be incredibly rewarding due to R's powerful built-in statistical tools, but it can also be extremely frustrating at times. Thankfully, like much else to do with Ruby, integrating R and Ruby... [Read more...]
1 2

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)