# Blog Archives

## Estimate Age from First Name

July 29, 2013
By

Today I read a cute post from Flowing Data on the most trendy names in US history. What caught my attention was a link posted in the article to the source data, which happens to be yearly lists of baby … Continue reading →

July 26, 2013
By

I’m very indebted to the ff and ffbase packages in R.  Without them, I probably would have to use some less savoury stats program for my bigger data analysis projects that I do at work. Since I started using ff … Continue reading →

## Access individual elements of a row while using the apply function on your dataframe (or “applying down while thinking across”)

July 2, 2013
By

The apply function in R is a huge work-horse for me across many projects.  My usage of it is pretty stereotypical.  Usually, I use it to make aggregations of a targeted group of columns for every row in a dataframe. … Continue reading →

## Which Torontonians Want a Casino? Survey Analysis Part 2

May 17, 2013
By

In my last post I said that I would try to investigate the question of who actually does want a casino, and whether place of residence is a factor in where they want the casino to be built.  So, here … Continue reading →

## When the “reorder” function just isn’t good enough…

May 6, 2013
By

The reorder function, in R 3.0.0, is behaving strangely (or I’m really not understanding something).  Take the following simple data frame: df = data.frame(a1 = c(4,1,1,3,2,4,2), a2 = c(“h”,”j”,”j”,”e”,”c”,”h”,”c”)) I expect that if I call the reorder function on the … Continue reading →

## Do Torontonians Want a New Casino? Survey Analysis Part 1

May 2, 2013
By

Toronto City Council is in the midst of a very lengthy process of considering whether or not to allow the OLG to build of a new casino in Toronto, and where.  The process started in November of 2012, and set … Continue reading →

## Using ddply to select the first record of every group

April 13, 2013
By

I had a very long file of monetary transactions (about 207,000 rows) with about two handfuls of columns describing each transaction (including date).  The task I needed to perform on this file was to select the value from one of … Continue reading →

## Split, Apply, and Combine for ffdf

March 22, 2013
By

Call me incompetent, but I just can’t get ffdfdply to work with my ffdf dataframes.  I’ve tried repeatedly and it just doesn’t seem to work!  I’ve seen numerous examples on stackoverflow, but maybe I’m applying them incorrectly.  Wanting to do some … Continue reading →

## Finding Patterns Amongst Binary Variables with the homals Package

February 10, 2013
By

It’s survey analysis season for me at work!  When analyzing survey data, the one kind of analysis I have realized that I’m not used to doing is finding patterns in binary data.  In other words, if I have a question … Continue reading →

## Multiple Classification and Authorship of the Hebrew Bible

January 1, 2013
By

Sitting in my synagogue this past Saturday, I started thinking about the authorship analysis that I did using function word counts from texts authored by Shakespeare, Austen, etc.  I started to wonder if I could do something similar with the … Continue reading →