Blog Archives

A Rather Nosy Topic Model Analysis of the Enron Email Corpus

November 3, 2013
By
A Rather Nosy Topic Model Analysis of the Enron Email Corpus

Having only ever played with Latent Dirichlet Allocation using gensim in python, I was very interested to see a nice example of this kind of topic modelling in R.  Whenever I see a really cool analysis done, I get the … Continue reading →

Read more »

When did “How I Met Your Mother” become less legen.. wait for it…

October 21, 2013
By
When did “How I Met Your Mother” become less legen.. wait for it…

…dary!  Or, as you’ll see below, when did it become slightly less legendary?  The analysis in this post was inspired by DiffusePrioR’s analysis of when The Simpsons became less Cromulent. When I read his post a while back, I thought … Continue reading →

Read more »

Big and small daycares in Toronto by building type, mapped using RGoogleMaps and Toronto Open Data

October 17, 2013
By
Big and small daycares in Toronto by building type, mapped using RGoogleMaps and Toronto Open Data

Before my daughter was born, I thought that my wife and I would have to send her to a licensed child care centre somewhere in Toronto.  I had heard over and over how long of a waiting list I should … Continue reading →

Read more »

Who uses E-Bikes in Toronto? Fun with Recursive Partitioning Trees and Toronto Open Data

September 12, 2013
By
Who uses E-Bikes in Toronto?  Fun with Recursive Partitioning Trees and Toronto Open Data

I found a fun survey released to the Toronto Open Data website that investigates the travel/commuting behaviour of Torontonians, but with a special focus on E-bikes.  When I opened up the file, I found various demographic information, in addition to a … Continue reading →

Read more »

sapply is my new friend!

August 15, 2013
By
sapply is my new friend!

I’ve written previously about how the apply function is a major workhorse in many of my work projects. What I didn’t know is how handy the sapply function can be! There are a couple of cases so far where I’ve … Continue reading →

Read more »

Package sqldf eases the multivariable sorting pain

August 1, 2013
By

This will be a quick one.  I was trying to sort my dataframe so that it went in ascending order on one variable and descending order on another variable.  This was really REALLY bothersome to try to figure out with … Continue reading →

Read more »

Estimating Ages from First Names Part 2 – Using Some Morbid Test Data

July 31, 2013
By
Estimating Ages from First Names Part 2 – Using Some Morbid Test Data

In my last post, I wrote about how I compiled a US Social Security Agency data set into something usable in R, and mentioned some issues scaling it up to be usable for bigger datasets.  I also mentioned the need … Continue reading →

Read more »

Estimate Age from First Name

July 29, 2013
By
Estimate Age from First Name

Today I read a cute post from Flowing Data on the most trendy names in US history. What caught my attention was a link posted in the article to the source data, which happens to be yearly lists of baby … Continue reading →

Read more »

save.ffdf and load.ffdf: Save and load your big data – quickly and neatly!

July 26, 2013
By
save.ffdf and load.ffdf: Save and load your big data – quickly and neatly!

I’m very indebted to the ff and ffbase packages in R.  Without them, I probably would have to use some less savoury stats program for my bigger data analysis projects that I do at work. Since I started using ff … Continue reading →

Read more »

Access individual elements of a row while using the apply function on your dataframe (or “applying down while thinking across”)

July 2, 2013
By
Access individual elements of a row while using the apply function on your dataframe (or “applying down while thinking across”)

The apply function in R is a huge work-horse for me across many projects.  My usage of it is pretty stereotypical.  Usually, I use it to make aggregations of a targeted group of columns for every row in a dataframe. … Continue reading →

Read more »