Blog Archives

sapply is my new friend!

August 15, 2013
By
sapply is my new friend!

I’ve written previously about how the apply function is a major workhorse in many of my work projects. What I didn’t know is how handy the sapply function can be! There are a couple of cases so far where I’ve … Continue reading →

Read more »

Package sqldf eases the multivariable sorting pain

August 1, 2013
By

This will be a quick one.  I was trying to sort my dataframe so that it went in ascending order on one variable and descending order on another variable.  This was really REALLY bothersome to try to figure out with … Continue reading →

Read more »

Estimating Ages from First Names Part 2 – Using Some Morbid Test Data

July 31, 2013
By
Estimating Ages from First Names Part 2 – Using Some Morbid Test Data

In my last post, I wrote about how I compiled a US Social Security Agency data set into something usable in R, and mentioned some issues scaling it up to be usable for bigger datasets.  I also mentioned the need … Continue reading →

Read more »

Estimate Age from First Name

July 29, 2013
By
Estimate Age from First Name

Today I read a cute post from Flowing Data on the most trendy names in US history. What caught my attention was a link posted in the article to the source data, which happens to be yearly lists of baby … Continue reading →

Read more »

save.ffdf and load.ffdf: Save and load your big data – quickly and neatly!

July 26, 2013
By
save.ffdf and load.ffdf: Save and load your big data – quickly and neatly!

I’m very indebted to the ff and ffbase packages in R.  Without them, I probably would have to use some less savoury stats program for my bigger data analysis projects that I do at work. Since I started using ff … Continue reading →

Read more »

Access individual elements of a row while using the apply function on your dataframe (or “applying down while thinking across”)

July 2, 2013
By
Access individual elements of a row while using the apply function on your dataframe (or “applying down while thinking across”)

The apply function in R is a huge work-horse for me across many projects.  My usage of it is pretty stereotypical.  Usually, I use it to make aggregations of a targeted group of columns for every row in a dataframe. … Continue reading →

Read more »

Which Torontonians Want a Casino? Survey Analysis Part 2

May 17, 2013
By
Which Torontonians Want a Casino?  Survey Analysis Part 2

In my last post I said that I would try to investigate the question of who actually does want a casino, and whether place of residence is a factor in where they want the casino to be built.  So, here … Continue reading →

Read more »

When the “reorder” function just isn’t good enough…

May 6, 2013
By
When the “reorder” function just isn’t good enough…

The reorder function, in R 3.0.0, is behaving strangely (or I’m really not understanding something).  Take the following simple data frame: df = data.frame(a1 = c(4,1,1,3,2,4,2), a2 = c(“h”,”j”,”j”,”e”,”c”,”h”,”c”)) I expect that if I call the reorder function on the … Continue reading →

Read more »

Do Torontonians Want a New Casino? Survey Analysis Part 1

May 2, 2013
By
Do Torontonians Want a New Casino?  Survey Analysis Part 1

Toronto City Council is in the midst of a very lengthy process of considering whether or not to allow the OLG to build of a new casino in Toronto, and where.  The process started in November of 2012, and set … Continue reading →

Read more »

Using ddply to select the first record of every group

April 13, 2013
By
Using ddply to select the first record of every group

I had a very long file of monetary transactions (about 207,000 rows) with about two handfuls of columns describing each transaction (including date).  The task I needed to perform on this file was to select the value from one of … Continue reading →

Read more »