Articles by inkhorn82

Estimate Age from First Name

July 29, 2013 | inkhorn82

Today I read a cute post from Flowing Data on the most trendy names in US history. What caught my attention was a link posted in the article to the source data, which happens to be yearly lists of baby … Continue reading → [Read more...]

save.ffdf and load.ffdf: Save and load your big data – quickly and neatly!

July 26, 2013 | inkhorn82

I’m very indebted to the ff and ffbase packages in R. Without them, I probably would have to use some less savoury stats program for my bigger data analysis projects that I do at work. Since I started using ff … Continue reading → [Read more...]

Access individual elements of a row while using the apply function on your dataframe (or “applying down while thinking across”)

July 2, 2013 | inkhorn82

The apply function in R is a huge work-horse for me across many projects. My usage of it is pretty stereotypical. Usually, I use it to make aggregations of a targeted group of columns for every row in a dataframe. … Continue reading → [Read more...]

Which Torontonians Want a Casino? Survey Analysis Part 2

May 17, 2013 | inkhorn82

In my last post I said that I would try to investigate the question of who actually does want a casino, and whether place of residence is a factor in where they want the casino to be built. So, here … Continue reading → [Read more...]

When the “reorder” function just isn’t good enough…

May 6, 2013 | inkhorn82

The reorder function, in R 3.0.0, is behaving strangely (or I’m really not understanding something). Take the following simple data frame: df = data.frame(a1 = c(4,1,1,3,2,4,2), a2 = c(“h”,”j”,”j”,”e”,”c”,”h”,”c”)) I expect that if I call the reorder function on the … Continue reading → [Read more...]

Do Torontonians Want a New Casino? Survey Analysis Part 1

May 2, 2013 | inkhorn82

Toronto City Council is in the midst of a very lengthy process of considering whether or not to allow the OLG to build of a new casino in Toronto, and where. The process started in November of 2012, and set … Continue reading → [Read more...]

Using ddply to select the first record of every group

April 13, 2013 | inkhorn82

I had a very long file of monetary transactions (about 207,000 rows) with about two handfuls of columns describing each transaction (including date). The task I needed to perform on this file was to select the value from one of … Continue reading → [Read more...]

Split, Apply, and Combine for ffdf

March 22, 2013 | inkhorn82

Call me incompetent, but I just can’t get ffdfdply to work with my ffdf dataframes. I’ve tried repeatedly and it just doesn’t seem to work! I’ve seen numerous examples on stackoverflow, but maybe I’m applying them incorrectly. Wanting to do some … Continue reading → [Read more...]

Finding Patterns Amongst Binary Variables with the homals Package

February 10, 2013 | inkhorn82

It’s survey analysis season for me at work! When analyzing survey data, the one kind of analysis I have realized that I’m not used to doing is finding patterns in binary data. In other words, if I have a question … Continue reading → [Read more...]

Multiple Classification and Authorship of the Hebrew Bible

January 1, 2013 | inkhorn82

Sitting in my synagogue this past Saturday, I started thinking about the authorship analysis that I did using function word counts from texts authored by Shakespeare, Austen, etc. I started to wonder if I could do something similar with the … Continue reading → [Read more...]

My Intro to Multiple Classification with Random Forests, Conditional Inference Trees, and Linear Discriminant Analysis

December 27, 2012 | inkhorn82

After the work I did for my last post, I wanted to practice doing multiple classification. I first thought of using the famous iris dataset, but felt that was a little boring. Ideally, I wanted to look for a practice … Continue reading → [Read more...]

Binary Classification – A Comparison of “Titanic” Proportions Between Logistic Regression, Random Forests, and Conditional Trees

December 23, 2012 | inkhorn82

Now that I’m on my winter break, I’ve been taking a little bit of time to read up on some modelling techniques that I’ve never used before. Two such techniques are Random Forests and Conditional Trees. Since both can be used … Continue reading → [Read more...]

My Goodness. What a Fat Dataset!

October 25, 2012 | inkhorn82

Recently at work we got sent a data file containing information on donations to a specific charitable organization, ranging all the way back to the 80′s. Usually, when we receive a dataset with a donation history in it, each row … Continue reading → [Read more...]

Know Your Dataset: Specifying colClasses to load up an ffdf

October 10, 2012 | inkhorn82

When I finally figured out how to successfully use the ff package to load data into R, I was apparently working with relatively pain free data to load up through read.csv.ffdf (see my previous post). Just this past Sunday, I … Continue reading → [Read more...]

A function to find the “Penultimax”

September 13, 2012 | inkhorn82

Penulti-what? Let me explain: Today I had to iteratively go through each row of a donor history dataset and compare a donor’s maximum yearly donation total to the second highest yearly donation total. In even more concrete terms, for each … Continue reading → [Read more...]

Big data analysis, for free, in R (or “How I learned to load, manipulate, and save data using the ff package”)

September 11, 2012 | inkhorn82

Before choosing to support the purchase of Statistica at my workplace, I came across the ff package as an option for working with really big datasets (with special attention paid to ff dataframes, or ffdf). It looked like a good … Continue reading → [Read more...]

A Return to Reliable R

September 5, 2012 | inkhorn82

The saga with Statistica continues: Statistica kept crashing on me while doing my data processing. One of the big problems was a wonderful bug that occurred when some of my text data variables were coded (unsurprisingly) as text! Under this … Continue reading → [Read more...]

« 1 2 3 »

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Articles by inkhorn82

Estimate Age from First Name

save.ffdf and load.ffdf: Save and load your big data – quickly and neatly!

Access individual elements of a row while using the apply function on your dataframe (or “applying down while thinking across”)

Which Torontonians Want a Casino? Survey Analysis Part 2

When the “reorder” function just isn’t good enough…

Do Torontonians Want a New Casino? Survey Analysis Part 1

Using ddply to select the first record of every group

Split, Apply, and Combine for ffdf

Finding Patterns Amongst Binary Variables with the homals Package

Multiple Classification and Authorship of the Hebrew Bible

My Intro to Multiple Classification with Random Forests, Conditional Inference Trees, and Linear Discriminant Analysis

Binary Classification – A Comparison of “Titanic” Proportions Between Logistic Regression, Random Forests, and Conditional Trees

My Goodness. What a Fat Dataset!

Know Your Dataset: Specifying colClasses to load up an ffdf

A function to find the “Penultimax”

Big data analysis, for free, in R (or “How I learned to load, manipulate, and save data using the ff package”)

A Return to Reliable R

Processing Data from a Statistica Worksheet Using R

Using R from Inside Statistica

ggplot2: Creating a custom plot with two different geoms

Articles by inkhorn82

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)