Blog Archives

My Intro to Multiple Classification with Random Forests, Conditional Inference Trees, and Linear Discriminant Analysis

December 27, 2012
By
My Intro to Multiple Classification with Random Forests, Conditional Inference Trees, and Linear Discriminant Analysis

After the work I did for my last post, I wanted to practice doing multiple classification.  I first thought of using the famous iris dataset, but felt that was a little boring.  Ideally, I wanted to look for a practice … Continue reading →

Read more »

Binary Classification – A Comparison of “Titanic” Proportions Between Logistic Regression, Random Forests, and Conditional Trees

December 23, 2012
By
Binary Classification – A Comparison of “Titanic” Proportions Between Logistic Regression, Random Forests, and Conditional Trees

Now that I’m on my winter break, I’ve been taking a little bit of time to read up on some modelling techniques that I’ve never used before. Two such techniques are Random Forests and Conditional Trees.  Since both can be used … Continue reading →

Read more »

My Goodness. What a Fat Dataset!

October 25, 2012
By
My Goodness.  What a Fat Dataset!

Recently at work we got sent a data file containing information on donations to a specific charitable organization, ranging all the way back to the 80′s.  Usually, when we receive a dataset with a donation history in it, each row … Continue reading →

Read more »

Know Your Dataset: Specifying colClasses to load up an ffdf

October 10, 2012
By
Know Your Dataset: Specifying colClasses to load up an ffdf

When I finally figured out how to successfully use the ff package to load data into R, I was apparently working with relatively pain free data to load up through read.csv.ffdf (see my previous post).  Just this past Sunday, I … Continue reading →

Read more »

A function to find the “Penultimax”

September 13, 2012
By
A function to find the “Penultimax”

Penulti-what?  Let me explain: Today I had to iteratively go through each row of a donor history dataset and compare a donor’s maximum yearly donation total to the second highest yearly donation total.  In even more concrete terms, for each … Continue reading →

Read more »

Big data analysis, for free, in R (or “How I learned to load, manipulate, and save data using the ff package”)

September 11, 2012
By
Big data analysis, for free, in R (or “How I learned to load, manipulate, and save data using the ff package”)

Before choosing to support the purchase of Statistica at my workplace, I came across the ff package as an option for working with really big datasets (with special attention paid to ff dataframes, or ffdf). It looked like a good … Continue reading →

Read more »

A Return to Reliable R

September 5, 2012
By
A Return to Reliable R

The saga with Statistica continues: Statistica kept crashing on me while doing my data processing.  One of the big problems was a wonderful bug that occurred when some of my text data variables were coded (unsurprisingly) as text!  Under this … Continue reading →

Read more »

Processing Data from a Statistica Worksheet Using R

August 29, 2012
By
Processing Data from a Statistica Worksheet Using R

Context: I work with data from non-profit organizations, and so a big concern in many of my analyses is if and how much people are donating from one year to the next.  One of the  things I normally like to do … Continue reading →

Read more »

Using R from Inside Statistica

August 17, 2012
By
Using R from Inside Statistica

I’ve been spending a lot of time in the last month or so doing projects at work not statistics related, hence the lack of posts!  In the interim, I had to do some serious research on handling datasets bigger than … Continue reading →

Read more »

ggplot2: Creating a custom plot with two different geoms

June 9, 2012
By
ggplot2: Creating a custom plot with two different geoms

This past week for work I had to create some plots to show the max, min, and median of a measure across the levels of a qualitative variable, and show the max and min of the same variable within a … Continue reading →

Read more »