Blog Archives

Binary Classification – A Comparison of “Titanic” Proportions Between Logistic Regression, Random Forests, and Conditional Trees

December 23, 2012
By
Binary Classification – A Comparison of “Titanic” Proportions Between Logistic Regression, Random Forests, and Conditional Trees

Now that I’m on my winter break, I’ve been taking a little bit of time to read up on some modelling techniques that I’ve never used before. Two such techniques are Random Forests and Conditional Trees.  Since both can be used … Continue reading →

Read more »

My Goodness. What a Fat Dataset!

October 25, 2012
By
My Goodness.  What a Fat Dataset!

Recently at work we got sent a data file containing information on donations to a specific charitable organization, ranging all the way back to the 80′s.  Usually, when we receive a dataset with a donation history in it, each row … Continue reading →

Read more »

Know Your Dataset: Specifying colClasses to load up an ffdf

October 10, 2012
By
Know Your Dataset: Specifying colClasses to load up an ffdf

When I finally figured out how to successfully use the ff package to load data into R, I was apparently working with relatively pain free data to load up through read.csv.ffdf (see my previous post).  Just this past Sunday, I … Continue reading →

Read more »

A function to find the “Penultimax”

September 13, 2012
By
A function to find the “Penultimax”

Penulti-what?  Let me explain: Today I had to iteratively go through each row of a donor history dataset and compare a donor’s maximum yearly donation total to the second highest yearly donation total.  In even more concrete terms, for each … Continue reading →

Read more »

Big data analysis, for free, in R (or “How I learned to load, manipulate, and save data using the ff package”)

September 11, 2012
By
Big data analysis, for free, in R (or “How I learned to load, manipulate, and save data using the ff package”)

Before choosing to support the purchase of Statistica at my workplace, I came across the ff package as an option for working with really big datasets (with special attention paid to ff dataframes, or ffdf). It looked like a good … Continue reading →

Read more »

A Return to Reliable R

September 5, 2012
By
A Return to Reliable R

The saga with Statistica continues: Statistica kept crashing on me while doing my data processing.  One of the big problems was a wonderful bug that occurred when some of my text data variables were coded (unsurprisingly) as text!  Under this … Continue reading →

Read more »

Processing Data from a Statistica Worksheet Using R

August 29, 2012
By
Processing Data from a Statistica Worksheet Using R

Context: I work with data from non-profit organizations, and so a big concern in many of my analyses is if and how much people are donating from one year to the next.  One of the  things I normally like to do … Continue reading →

Read more »

Using R from Inside Statistica

August 17, 2012
By
Using R from Inside Statistica

I’ve been spending a lot of time in the last month or so doing projects at work not statistics related, hence the lack of posts!  In the interim, I had to do some serious research on handling datasets bigger than … Continue reading →

Read more »

ggplot2: Creating a custom plot with two different geoms

June 9, 2012
By
ggplot2: Creating a custom plot with two different geoms

This past week for work I had to create some plots to show the max, min, and median of a measure across the levels of a qualitative variable, and show the max and min of the same variable within a … Continue reading →

Read more »

Load Packages Automatically in RStudio

June 6, 2012
By
Load Packages Automatically in RStudio

I recently finished a long stretch of work on a particular project that required me to draw upon four R packages.  Each time I got back to my work on the project, I’d have to load the packages manually, as … Continue reading →

Read more »