Blog Archives

Teaching a Class of Undergrads, RStudio Server, and My Ubuntu Machine

February 2, 2014
By
Teaching a Class of Undergrads, RStudio Server, and My Ubuntu Machine

I was chatting about public speaking with my brother, who is a Lecturer in the Faculty of Pharmacy at UofT, when he offered me the opportunity to come to his class and teach about R.  Always eager to spread the … Continue reading →

Read more »

Nuclear vs Green Energy: Share the Wealth or Get Your Own?

December 12, 2013
By
Nuclear vs Green Energy: Share the Wealth or Get Your Own?

Thanks to Ontario Open Data, a survey dataset was recently made public containing peoples’ responses to questions about Ontario’s Long Term Energy Plan (LTEP).  The survey did fairly well in terms of raw response numbers, with 7,889 responses in total … Continue reading →

Read more »

Enron Email Corpus Topic Model Analysis Part 2 – This Time with Better regex

November 4, 2013
By
Enron Email Corpus Topic Model Analysis Part 2 – This Time with Better regex

After posting my analysis of the Enron email corpus, I realized that the regex patterns I set up to capture and filter out the cautionary/privacy messages at the bottoms of peoples emails were not working.  Let’s have a look at … Continue reading →

Read more »

A Rather Nosy Topic Model Analysis of the Enron Email Corpus

November 3, 2013
By
A Rather Nosy Topic Model Analysis of the Enron Email Corpus

Having only ever played with Latent Dirichlet Allocation using gensim in python, I was very interested to see a nice example of this kind of topic modelling in R.  Whenever I see a really cool analysis done, I get the … Continue reading →

Read more »

When did “How I Met Your Mother” become less legen.. wait for it…

October 21, 2013
By
When did “How I Met Your Mother” become less legen.. wait for it…

…dary!  Or, as you’ll see below, when did it become slightly less legendary?  The analysis in this post was inspired by DiffusePrioR’s analysis of when The Simpsons became less Cromulent. When I read his post a while back, I thought … Continue reading →

Read more »

Big and small daycares in Toronto by building type, mapped using RGoogleMaps and Toronto Open Data

October 17, 2013
By
Big and small daycares in Toronto by building type, mapped using RGoogleMaps and Toronto Open Data

Before my daughter was born, I thought that my wife and I would have to send her to a licensed child care centre somewhere in Toronto.  I had heard over and over how long of a waiting list I should … Continue reading →

Read more »

Who uses E-Bikes in Toronto? Fun with Recursive Partitioning Trees and Toronto Open Data

September 12, 2013
By
Who uses E-Bikes in Toronto?  Fun with Recursive Partitioning Trees and Toronto Open Data

I found a fun survey released to the Toronto Open Data website that investigates the travel/commuting behaviour of Torontonians, but with a special focus on E-bikes.  When I opened up the file, I found various demographic information, in addition to a … Continue reading →

Read more »

sapply is my new friend!

August 15, 2013
By
sapply is my new friend!

I’ve written previously about how the apply function is a major workhorse in many of my work projects. What I didn’t know is how handy the sapply function can be! There are a couple of cases so far where I’ve … Continue reading →

Read more »

Package sqldf eases the multivariable sorting pain

August 1, 2013
By

This will be a quick one.  I was trying to sort my dataframe so that it went in ascending order on one variable and descending order on another variable.  This was really REALLY bothersome to try to figure out with … Continue reading →

Read more »

Estimating Ages from First Names Part 2 – Using Some Morbid Test Data

July 31, 2013
By
Estimating Ages from First Names Part 2 – Using Some Morbid Test Data

In my last post, I wrote about how I compiled a US Social Security Agency data set into something usable in R, and mentioned some issues scaling it up to be usable for bigger datasets.  I also mentioned the need … Continue reading →

Read more »